The Structure of Genealogies in the Presence of Purifying 
Selection: A "Fitness-Class Coalescent" 



Aleksandra M. Walczak 1 -*, Lauren E. Nicolaisen 2 -*, Joshua B. Plotkin 3 , and Michael M. 
Desai 2 

CNRS-Laboratoire de Physique Theorique de VEcole Normale Superieure, 

2 Department of Organismic and Evolutionary Biology, Department of Physics, and 
FAS Center for Systems Biology, Harvard University 

3 Department of Biology, University of Pennsylvania 
* These authors contributed equally to this work 

(Dated: May 30, 2011) 

Abstract 

Compared to a neutral model, purifying selection distorts the structure of 
genealogies and hence alters the patterns of sampled genetic variation. Al- 
though these distortions may be common in nature, our understanding of 
how we expect purifying selection to affect patterns of molecular variation 
remains incomplete. Genealogical approaches such as coalescent theory 
have proven difficult to generalize to situations involving selection at many 
linked sites, unless selection pressures are extremely strong. Here, we intro- 
duce an effective coalescent theory (a "fitness-class coalescent" ) to describe 
the structure of genealogies in the presence of purifying selection at many 
linked sites. We use this effective theory to calculate several simple statis- 
tics describing the expected patterns of variation in sequence data, both at 
the sites under selection and at linked neutral sites. Our analysis combines 
our earlier description of the allele frequency spectrum in the presence of 
purifying selection (Desai et al, 2010) with the structured coalescent ap- 
proach of Nordborg (1997), to trace the ancestry of individuals through 
the distribution of fitnesses within the population. Alternatively, we can 
derive our results using an extension of the coalescent approach of Hudson 
and Kaplan (1994). We find that purifying selection leads to patterns of 
genetic variation that are related but not identical to a neutrally evolving 
population in which population size has varied in a specific way in the past. 
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INTRODUCTION 



Purifying selection acting simultaneously at many linked sites ( "background selection" ) can 
substantially alter the patterns of molecular variation at these sites, and at linked neutral 
sites (Etheridge and Griffiths, 2009; Etheridge et a/., 2010; Gordo et a/., 2002; 
Hahn, 2008; Hill and Robertson, 1966; Hudson and Kaplan, 1994, 1995; Kaplan 
et a/., 1988; McVean and Charlesworth, 2000; O 'Fallon et a/., 2010; Seger et a/., 
2010). In recent years, evidence from sequence data points to the general importance of 
weak selective forces among many linked variants in microbial and viral populations, and 
on short distance scales in the genomes of sexual organisms (Comeron et al, 2008; Hahn, 
2008; Seger et al, 2010). In these situations, existing theory does not fully explain patterns 
of molecular evolution (Hahn, 2008). 

It is difficult to incorporate negative selection at many linked sites into genealogical 
frameworks such as coalescent theory, since these frameworks typically rely on characterizing 
the space of possible genealogical trees before considering the possibility of mutations at 
various locations on these trees. When selection operates, the probabilities of particular 
trees cannot be defined independently of the mutations, and the approach breaks down 
(Tavare, 2004; Wakeley, 2009). 

Despite this difficulty, a number of productive approaches have been developed to pre- 
dict how negative selection influences patterns of molecular variation and to infer selection 
pressures from data. Charlesworth et al. (1993) showed that strong purifying selec- 
tion reduces the effective population size relevant for linked neutral sites (Charlesworth, 
1994; Charlesworth et al, 1995). However, weaker selection also distorts patterns of vari- 
ation, in a way that cannot be completely described by a neutral model with any effective 
population size (Comeron and Kreitman, 2002; McVean and Charlesworth, 2000) 
- a phenomenon often referred to as Hill-Robertson interference (Hill and Robertson, 
1966). Several theoretical frameworks have been developed to analyze this situation. The 
ancestral selection graph of Neuhauser and Krone (1997) and Krone and Neuhauser 
(1997) provides an elegant formal solution to the problem, but unfortunately it requires 
extensive numerical calculations (Przeworski et al, 1999). These limit the intuition we 
can draw from this method, and make it impractical as the basis for inference from most 
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modern sequence data. An alternative approach is based on the structured coalescent of 
Nordborg (1997), which views the population as subdivided into different fitness classes 
and traces the genealogies of individuals as they move between classes. This approach was 
first introduced by Kaplan et al (1988) and Hudson (1990) and further developed by 
Hudson and Kaplan (1994) and Hudson and Kaplan (1995). It has been the basis for 
computational methods developed by Gordo et al (2002) and Seger et al (2010) and 
analytical approaches such as those of Barton and Etheridge (2004), Hermisson et al 
(2002), and O 'Fallon et al (2010). 

In this paper, we build on the structured coalescent framework by introducing the idea 
of a "fitness-class coalescent." Rather than considering the coalescence process in real time, 
we treat each fitness class as a "generation" and trace how individuals have descended by 
mutations through fitness classes, moving from one "generation" to the next by subsequent 
mutations. We show that the coalescent probabilities in this fitness-class coalescent can 
be computed using an approach based on the Poisson Random Field method of Sawyer 
and Hartl (1992), or equivalently can be exactly derived as an extension of the structured 
coalescent approach of Hudson and Kaplan (1994). 

Our fitness-class coalescent theory can be precisely mapped to a coalescence theory in 
which certain quantities (e.g. coalescence times) have different meanings than in the tradi- 
tional theory. We can then invert this mapping to determine the structure of genealogies and 
calculate statistics describing expected patterns of genetic variation. This approach requires 
certain approximations, but it also has several advantages. Most importantly, we are able 
to derive relatively simple analytic expressions for coalescent probabilities and distributions 
of simple statistics such as heterozygosity. Consistent with earlier work, we find that the 
effects of purifying selection are broadly similar to an effective population size that changes 
as time recedes into the past. Our analysis makes this analysis precise and quantitative: 
we can compute the exact form of this time-varying effective population size. We also show 
that this intuition has important limitations: for example, different pairs of individuals have 
different time- varying effective population size histories, meaning that in principle it may 
be possible to distinguish selection from changing population size. Our approach also makes 
it possible to calculate the diversity created at the selected sites themselves, which may be 
important when selection is common. 
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We begin in the next section by describing the fitness-class coalescent idea which under- 
lies our approach. We then describe the details of our model and analyze two alternative 
ways to implement the fitness-class coalescent. The first relies on the framework developed 
in Desai et al (2010) to calculate the frequency distribution of distinct lineages within 
each fitness class. This provides a simple intuitive framework for computing the structure of 
genealogies, but is algebraically involved. The second approach is based on tracing paths in 
the order that events occur as described by Hudson and Kaplan (1994), and implemented 
numerically by Gordo et al (2002). This approach has the advantage of algebraic simplic- 
ity, and it provides a correspondence between our analytical results and earlier structured 
coalescent methods. However, it is unwieldy to generalize to other types of selection and is 
less intuitive in certain respects. We show how both approaches can be used to analyze the 
structures of genealogies, and we calculate various statistics describing genetic variation in 
these populations, which we compare to numerical simulations. We finally discuss the rela- 
tionship between our results, neutral theory, and earlier work on selection, and we explore 
how various approximations limit our approach. The most important of these approxima- 
tions is that we neglect Muller's ratchet. We discuss this and related approximations briefly 
in the next section, and justify their regime of validity in more detail in the Discussion. 

THE FITNESS-CLASS COALESCENT 

We begin in this section by outlining the main ideas underlying our approach. We begin our 
analysis by considering the balance between mutations at many linked sites and negative 
selection against the mutants, which leads to an equilibrium distribution of fitnesses within 
a population (Haigh, 1978). We illustrate this in Fig. 1, for the case in which all deleterious 
mutations have the same fitness cost. Each individual is characterized by the number k of 
deleterious mutations it contains. Each fitness class k contains many genetically distinct 
lineages, each of which arose from mutations in more-fit individuals, as illustrated in Fig. 2. 

Hudson and Kaplan (1994) observed that individual lineages move between fitnesses 
by mutations, and that when two individuals are in the same fitness class they could be from 
the same lineage and hence coalesce. Our fitness-class coalescent exploits this observation to 
define an effective genealogical process that completely bypasses the ancestral process in real 
time. Instead, we treat each fitness class as a "generation," and we count time in deleterious 
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mutations: each deleterious mutation moves us from one "generation" to the next. In this 
way, we can trace the ancestry of individuals through the fitness distribution. For example, 
there is some probability that two individuals chosen from fitness class k are genetically 
identical (i.e. come from the same lineage). If not, they each arose from mutations within 
fitness class k — 1. If both those mutations occurred in individuals in the same lineage in 
fitness class k — 1, we say the two individuals "coalesced" in class k — 1. If not, they came 
from different mutations from class k — 2, and could have coalesced there, and so on. In 
this way, we can construct a fitness-class coalescent tree describing the relatedness of two 
individuals, as illustrated in Fig. 2. 

In this paper we show that the probability that two randomly chosen individuals who are 
currently in fitness classes k and k f coalesce in class k — I, p^ k/ ^ k ~ £ ^ i s approximately 

pk,k'^k-t 1 j\k,k' ^ 

^k-t^k-t 

where is the population size of fitness class fc, is an effective selection pressure against 
these individuals, and 

A k,k' _ \k-i) \k-tj 



( k+k' \ 
\2i+k f -k) 



(2) 



This coalescent probability is inversely proportional to the population size of the fitness 
class, rik-t, and the effective selection coefficient within that class, Sk-i ) modified by the 
combinatoric coefficient A k,k . As we will see, this has a clear intuitive interpretation. Fitness 
class k — £ has size so the coalescence probability per real generation is We will 

see that each lineage spends of order s^-i generations in that class, so the total coalescence 

probability in this class has the form — — . This is multiplied by A k / k /2, which we will 

show describes the probability that the two individuals are in class k — £ at the same time. 
In other words, the probability coalescence occurs in a class equals the inverse population 
size of the class times the number of generations lineages spend together in that class. In the 
following sections of this paper we derive Eq. 1 in the two alternative ways mentioned in the 
Introduction: by explicitly considering the lineage frequency distribution and by following 
the path summation method of Hudson and Kaplan (1994), Gordo et al (2002), and 
Barton and Etheridge (2004). 

Calculating statistics describing sequence variation: Our approach of treating 
mutation events as timesteps, and computing coalescence probabilities at each timestep, 
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allows us to make a precise mapping to coalescence theory in which certain quantities have 
a different meaning than in the traditional theory. In this framework, we can calculate a 
simple analytic expression for the probability two lineages sampled from particular fitness 
classes will coalesce in any other fitness class. These fitness-class coalescence probabilities 
allow us to explicitly calculate the structure of genealogies in this "mutation time." We 
can then compute the distribution of any statistic describing expected sequence variation 
by averaging over the fitness classes our original individuals come from. For a statistic x 
that depends on genealogies between two individuals, for example, we write expressions of 
the form 

P(x) = ^#(M0 Prob [M' coalesce in k - t]P(x\k,k' ',£), (3) 

where H(k,k f ) describes the probability two individuals sampled at random from the pop- 
ulation come from classes k and k! respectively. 

From the form of these expressions and our simple result for the coalescence probabilities, 
we can immediately see the main effect of selection on the structure of genealogies. The 
discussion following Eq. (1) implies that the effect of negative selection is similar to that of 
an effective population size that changes as time recedes into the distant past — i.e. some 
N e (t). This intuition has been suggested by earlier work (see e.g. Seger et al (2010)). As 
we will see, our analysis describes the precise form of N e (t): it follows the distribution rik-t 
as I increases further to the past, modified by the coefficient A^ k . We will also see that 
this picture of time- varying population size has limits: different pairs of individuals have a 
different N e (t). As is clear from Eq. (3), these different histories are averaged according 
to the distribution H{k,k'). While it is the average N e (t) between pairs that determines 
the distribution of pairwise statistics, this lack of a single N e (t) describing all individuals 
means that statistical power may exist in larger samples to distinguish negative selection 
from neutral population expansion. We explore these general conclusions of our analysis in 
detail in the Discussion. 

Note that in the standard neutral coalescent, one first calculates the distribution of coa- 
lescence times and then imagines mutations occurring as a Poisson process throughout the 
coalescent tree, with rates proportional to branch lengths. In our fitness-class coalescent, 
by contrast, the coalescence times are the mutations. To avoid confusion, from here on we 
will refer to the effective "generations" in our model as "steps," and refer to the fitness-class 
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coalescent "times" as the "steptimes." We will reserve the word "time" to refer to the actual 
coalescent time, measured in actual generations. 

After determining a fitness-class coalescent tree, we can invert our mapping to determine 
the structure of genealogies in real time. We will do this by calculating how the steptime 
in our fitness-class coalescent model translates into an actual time in generations. This will 
allow us to relate the distribution of branch lengths in steptimes to an actual coalescent 
tree in generations. We can then treat neutral mutations as is usually done in the standard 
coalescent: as a Poisson process with probabilities proportional to branch lengths. 

Our fitness-time coalescent requires a number of approximations which limit its applica- 
bility. Most importantly, we neglect Muller's ratchet, and more generally ignore the effects 
of fluctuations in the size of each fitness class. We have considered these approximations 
in Desai et al (2010), and return to consider them in more detail in the Discussion. We 
find that within a broad and biologically relevant parameter regime they lead to systematic 
but small corrections to our results. Despite these limitations, our approach also has several 
advantages relative to previous work. The fitness-time coalescent approach makes many 
otherwise difficult analytic calculations tractable, allows us to compute the diversity at the 
selected sites in addition to linked neutral sites, and may offer a useful basis for practical 
methods of coalescent simulation and inference. 

MODEL 

We now turn to the details of our model, which is identical to the model we studied in Desai 
et al (2010). We imagine a finite haploid population of constant size N. Each individual 
has a genome composed of a large number of sites. Each site is assumed to begin in some 
ancestral state, and can mutate with some constant rate. Each mutation is assumed to be 
either neutral or to confer some fitness disadvantage s (where by convention s > 0). We 
work within an infinite-sites approximation, where the probability that two mutations at 
the same site segregate simultaneously within the population is negligible. 

We assume that there is no epistasis for fitness, so each deleterious mutation contributes 
multiplicatively to the fitness of each individual. We assume that all deleterious mutations 
carry the same fitness cost s, and that s <C 1, so that the fitness of an individual with k 
deleterious mutations is approximately Wk — 1 — sk. 
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The dynamics of competing individuals are assumed to follow the diffusion limit of the 
standard Wright-Fisher model. In each generation an individual acquires a new deleterious 
mutation, somewhere in its genome, with probability Thus, Od/2 = NUd is the per- 
genome scaled deleterious mutation rate. Similarly, neutral mutations occur at a rate U n 
per individual per generation, and we define 9 n /2 = NU n . Whenever a mutation arises, 
it is assumed to arise at site for which there are no other segregating polymorphisms in 
the population (the infinite-sites assumption). We focus exclusively on the case of perfect 
linkage, where we imagine that all the sites we are considering are in an asexual genome 
or within a short enough distance in a sexual genome that recombination can be entirely 
neglected. Although our model is defined for haploids, this assumption means that our 
analysis also applies to diploid populations provided that there is no dominance (i.e. being 
homozygous for the deleterious mutation carries twice the fitness cost as being heterozygous). 
In this case, our model is equivalent to that considered by Hudson and Kaplan (1994). 

For the bulk of this paper, we will assume that Muller's ratchet can be neglected. While 
this assumption presented minimal problems in the context of the allele-based analysis in 
Desai et al (2010), it is more problematic here. Thus we will return to the question of the 
importance of Muller's ratchet in more detail in the Discussion. 

We believe that our model is the simplest possible null model based on a concrete picture 
of mutations at individual sites that can describe the effects of a large number of linked 
negatively selected sites on patterns of genetic variation. In Desai et al (2010) we discuss 
in more detail its relationship with other models which have been introduced in earlier 
related work. 

ALLELIC DIVERSITY IN THE DELETERIOUS 
MUTATION-SELECTION BALANCE 

Our analysis aims to develop a fitness-class coalescent theory that involves tracing the an- 
cestry of individuals as they change in fitness by acquiring deleterious mutations. In order 
to do this, we need to first understand the distribution of fitnesses within the population 
and the structure of lineage diversity amongst individuals within a given fitness class. We 
have analyzed these topics in detail in Desai et al (2010). Here we briefly summarize the 
results relevant for our subsequent coalescent analysis. 
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In our model all deleterious mutations have the same fitness cost s, and so we can classify 
individuals based on their Hamming class, fc, relative to the wildtype (which by definition has 
k — 0). That is, individuals in class k have k deleterious mutations more than the most-fit 
individuals in the population. Note that not all individuals in class k have the same set of k 
deleterious mutations. Furthermore, k refers only to the number of deleterious mutations an 
individual has; individuals with the same k can have different numbers of neutral mutations. 
We normalize fitness such that by definition all individuals in class k = have fitness 1. 
Individuals in class k then have fitness 1 — ks (Fig. 1). 

We showed in Desai et al. (2010) that the balance between mutation and selection leads 
to a steady state in which the fraction of the population in fitness class fc, which we call 
is given by a Poisson distribution with mean Ud/s, 



k\ V s 

This is consistent with the earlier work by Haigh (1978), and means that the average fitness 
in the population is 1 — U d , and that k = ^f. 

We will later need to understand the distributions of timings, Ql~ l (t), at which an 
individual mutates from class k — 1 to class k. We can calculate this by noting that the 
probability that an individual in class k arose from a mutation in an individual in class k — 1 
rather than a reproduction event from an individual in class k is 

NU d h k - X 



Nh k (l-U d ) + NU d h k -i 
Substituting in the steady state values for the h k: this becomes 

1 1 



(5) 



sk (6) 



This means that we have 

Q k k -\t) = ske- skt . (7) 

Note that this calculation is identical to the equivalent distribution of mutation timings 
computed by Gordo et al (2002) following the approach of Hudson and Kaplan (1994). 

We now consider the lineage structure within the mutation-selection balance. Consider a 
fitness class k ) which has an overall frequency hk (Fig. lb). The frequency hk is maintained 
by a stochastic process in which the class is constantly receiving new individuals from class 
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k — 1 due to mutations. In our infinite-alleles approximation, each such mutation creates 
a lineage which is an allele that is unique within the population. Each lineage fluctuates 
in frequency for a while before eventually dying out, perhaps after acquiring additional 
mutations that found new lineages in fitness class k + 1. At any given moment, there is 
some frequency distribution of lineages in each class k (see Fig. 2). While the identity of 
these lineages changes over time, there is a probability distribution that at any moment there 
is a given frequency distribution of lineages. In steady state, this probability distribution 
does not change with time. 

In Desai et al (2010), we calculated this steady state probability distribution of the 
frequency distribution of lineages. For our purposes here, it is most useful to consider these 
results in the absence of neutral mutations; we will consider the diversity at neutral sites 
separately below. In the absence of neutral mutations, we noted that new lineages are 
founded in class k at a rate 0^/2, where 



We refer to s k as the effective selection coefficient against an allele in class fc, because it is 
the rate at which any particular lineage in class k loses individuals, and we defined 



Our model then reduces to the situation studied by the Poisson Random Field model of 
Sawyer and Hartl (1992) and Hartl and Sawyer (1994). Thus the frequency distri- 
bution of lineages (alleles) in fitness class k follows a Poisson Random Field (PRF) with 
effective parameters 9 k and 7^. That is, the number of distinct lineages in class k with a 
frequency between a and b (relative to the total size of the fitness class Nh k ) is Poisson 
distributed with mean 



9 k = 2Nh k _ 1 U d . 



(8) 



These individuals are then removed from class at a per capita rate 



Sk = ~U d ~ s(k - k). 



(9) 



7 fc = Ns k . 



(10) 




(11) 



where 



fk(x) 



6 k 1 - e" 2 ^ 1 -^ 



(12) 



x(l - x) 1 - e -2 ^ 
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This is equivalent to saying that the probability that there exists a lineage in class k with 
frequency (in the entire population) between x and x + dx is fk{x)dx ) for infinitesimal dx. 

Note that this analysis involves various implicit approximations, and the results are valid 
within a specific parameter regime. We describe these approximations and limitations in 
detail in Desai et al (2010). Most importantly, our approach neglects the fact that although 
each fitness class will have an average size in a finite population there will be fluctuations 
around this h k . Furthermore, our PRF analysis neglects the fact that there is a correlation 
between the size of a lineage and the size of a fitness class conditional on that lineage existing. 
We analyzed these approximations in Appendix B of Desai et al (2010), and described in 
detail the parameter regimes in which they are valid. Note that all of the results we describe 
below include the corrections for correlations detailed in that Appendix. In the Discussion, 
we return to discuss in more detail a key aspect of this approximation — that we neglect 
the effects of Muller's ratchet — which is particularly relevant for the present work. 

Most importantly for our subsequent analysis, note that our Poisson Random Field result 
implies that on average the sum of all the frequencies of all the alleles in fitness class k is 
simply 



which implies that the frequency of the fitness class within the total population is and 
that the probability that two individuals chosen at the same time at random from fitness 
class k both come from the same lineage is 



We are now in a position to calculate the degree of relatedness between two individuals 
sampled from the population. Our goal is to understand the probability distribution of the 
fitness-class coalescence steptimes for two individuals chosen at random from the population. 
We begin by calculating the coalescence probability in each step. For now we neglect neutral 
mutations entirely and focus on formulating the fitness-class coalescent framework; we defer 
the calculations of neutral diversity to a later section. In this section we focus on the PRF- 
based method for calculating coalescent probabilities; we present an alternative derivation 




(13) 




(14) 



THE FITNESS-CLASS COALESCENT PROBABILITIES 
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based more directly on the method of Hudson and Kaplan (1994) in the next section. 

First, imagine that by chance we pick two individuals from the same fitness class k. This 
class has a total frequency hk as given in Eq. (4), and within the class there is a probability 
f k (x) as given in Eq. (12) that there exists a lineage with frequency x. Thus there is 
probability 

P^ k = fx 2 h(x) (15) 
Jo 

that these two chosen individuals come from the same lineage (note this expression contains 
the same implicit approximations as our calculation of Q 2 in Desai et al (2010)) If so, they 
are genetically identical and the coalescence steptime is 0. If not, we want to calculate the 
probability they coalesce in class k — 1, p^ k -^ k ~ 1 m If the lineage of individual A in class k 
was founded by a mutation from class k — 1 a time t\ ago, and the lineage of individual B in 
class k was founded by a mutation a time t 2 a g°> the probability the two individuals came 
from a common lineage in class — 1 is 

P c ^ = I dxdydt^Qt^M f^ yGk ~ liy ? *' h ~ tll) - (16) 

J tlk-l flk-1 

Here Qkj^itiih) is the joint distribution of t\ and i 2 , an d Gk-i(y —> x, \h — h\) is the 
probability a lineage in class k — 1 changes in frequency from x to y in time |t 2 — ti\ (where 
y could be 0, corresponding to a lineage that has already mutated back to class k — 2 by the 
time the second individual mutates to class k — 1). We return to the forms of these functions 
below. 

Note that all of these expressions assume that the distribution h k is constant in time. 
This is the same assumption we used in calculating fk(x). As we showed in Desai et al 
(2010), this is a good approximation in class k provided that Nh k sk 3> 1. As in Desai et al 
(2010), we only require in practice that this condition hold in the classes in the bulk of the 
fitness distribution; it can fail near the tails of the distribution without affecting our results 
because by definition only a very small fraction of the population are found in these tails. 
Note however that for the purposes of the present paper, certain additional complications 
can arise from fluctuations in hk in the high-fitness tail of the distribution, leading to Muller's 
ratchet. We neglect these ratchet effects here, but return to address them in the Discussion. 
These formulas also assume that the probability a single lineage represents a substantial 
fraction of the size of a fitness class can be neglected. We discussed a correction for this 
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effect in Appendix B of Desai et al (2010), and all of the results described below include 
this correction. 

If the two individuals coalesced in this first step, the coalescent steptime is 1. If not 
(which occurs with probability 1 — pM^-i^ we have to consider the probability they 
coalesce at the next step (i.e. in the mutations that took them from class k — 2 to k — 1). 
This probability is 



= f d X dyd tl dt 2 Qt?( h ,t 2 f h - 2{X) yGk ~ 2{y ? ^ ~ (17) 

Here t\ is the time the ancestor of individual A in class k mutated from class k — 2 to 
k— 1, and analogously for £ 2 ; Q^i^i^i) i s the joint distribution of these times, and /^_ 2 (x) 
and Gk-2 are defined as above. If the two individuals did not coalesce in this step, we can 
continue in the same vein and calculate p k ^ k ~ 3 ^ and so on. 

So far we have imaged that both individuals that we originally selected from the popula- 
tion came from the same class k. This will not generally be true. Rather, when we pick two 
individuals at random, they will come from classes k and k 1 with probability 



H(k,k') = < 



2h k h k > ifk^k 

(18) 

h\ if k = k f 



For convenience we choose k < k f . We define p k ^ k '^ k ~ i to be the probability that two 
individuals from classes k and k' coalesce in class k — £. Note that p k ^ k, ^ k ~ i = Q for £ < 0. 
For t > we have 

I**** = I dxdydt^Qt^M f't'^ VGt -' {V ^ *' " 2 " ■ (W) 

Of course the fact that k! > k means that typically t\ will be larger than £ 2 , and have a 
broader distribution. 

From the set of coalescence probabilities Eq. (19), we can calculate the probability 
distribution of coalescence steptimes between two individuals. We describe these steptimes 
by the distribution of classes in which coalescence occurs; given that we pick two individuals 
from classes k and k f (with k < k! by convention) the probability that they coalesce in class 
k — £ is simply 

£-1 

<i>k(t) = p^ k, ^ k - £ n [i - p*.*'->*-'] . (20) 
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Note that this expression contains a subtle approximation: if two lineages coalesce in class 
k — £ they were more likely to have coexisted in class k — £ + 1 and hence slightly more 
likely to have coalesced there than we have accounted for. We neglect this effect here; it is 
closely related to the nonconditional approximation discussed in more detail below and in 
Appendix A. We also note that, assuming that the probability that three lineages coalesce 
in a given step is negligible, we can in principle calculate the distribution of coalescent tree 
shapes and branch lengths in steptimes for a sample of any number of individuals. 

Computing the Coalescence Probabilities: We now have a formal structure describ- 
ing the structure of coalescent genealogies in the presence of negative selection. It remains, 
however, to evaluate the coalescent probabilities in each step, and to use these probabilities 
to calculate the probability distribution of genealogies. 

We begin by noting that the coalescent probabilities all depend on the transition prob- 
ability for the change in the frequency of a lineage from x to y in a time \t\ — t 2 \ in class 
k — I, Gk-i(y —> x, \t 2 — ti\). This transition probability was calculated by KlMURA (1955) 
and can be expressed as an infinite sum of Gegenbauer polynomials. Fortunately, it always 
appears in the context of an integral 

I G = J yG k -t(y x, |*2 - h\)dy, (21) 

which is simply the average of y over Gk-i- Hence this integral is given by the deterministic 
result for the change in the frequency of the lineage, 

I G = xe-< k -^-^. (22) 

This simple expression for Iq makes our approach analytically tractable. 

We now begin by evaluating the probability that two individuals chosen from fitness class 
k coalesce in class k — 1. Applying Eq. (22) to Eq. (16), we have 

J {hk-iV 
Since the two individuals mutated independently from class k — 1, we have Q^^i, £2) = 
Qt'itM-'ih), where Q k k ~ l (t) is given by Eq. (7). This gives 

p fc >fc - fc _i = j dx x 2 h-i(x) [ dtidt2 ( sk y ew [_ s ^ ti+t ^_ s ( k _ 1 y ti _ t2 \} (24 ) 
J n k _ x J 
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We can do the time integral by ordering t\ and £ 2 , and find it gives ^k-i) ' ^ x integral 
is more complex; we discussed integrals of this form in Appendices A and B of Desai et al 
(2010) and found that 

/W-^4*-'= 1 + 2 ^ ( ,_ <r (25) 
Plugging in this result, we have 

1 h 

pk,k^k-i 1 z (of)) 

l + 2Nh k - 1 s(k-l)2k-l' K } 

We now wish to calculate the probability two individuals both chosen from fitness class 
k coalesce in an arbitrary class k — £. First consider the probability of coalescence in class 
k — 2. This is given by 

p fc >fc _ fc _2 = J t2 f 2 fk^(x) ^ [ _ s(fc _ 2)| ^ _ ^ dtidhdx (2?) 

= I k x ~ 2 J Qk7k(ti,t 2 ) exp [s(k - 2)\h - t 2 \] dhdh. (28) 

The time t\ is now the sum of the time for one individual to have mutated from class k — 2 
to class k — 1 plus the time for it to have mutated from class k — 1 to class k ) and analogously 
for i 2 - However, in order for the two lineages to coalesce in class k — 2, they must not have 
coalesced in class k — 1. We refer to the probability distribution of the times when these 
individuals mutated from class k — 1 to class k conditional on them not having coalesced in 
class k — 1 as Ql~ k 1 (ti,t 2 \nc). We discuss this full calculation in Appendix A. Here we make 
use of a simpler approximation: since the coalescence probability in each step will turn out 
to be small, conditioning on not coalescing in class k—1 does not shift the distribution of 
mutation timings much. To be precise, Q^^i, t2\nc) differs from Q^ 1 (t^Q 1 ^ 1 (t 2 ) only by a 
factor proportional to p^ k -^ k ~ 1 m I n what follows, we will therefore neglect the complications 
associated with the probability distributions of the mutant timings conditional on non- 
coalescence, and use the simpler distributions of unconditional timings. Note that by a 
similar token we have also implicitly neglected the fact that coalescence did not occur in class 
k in computing the distribution of mutation timing relevant for computing the probability 
of coalescence in class k — 1. We refer to this as the non-conditional approximation, and 
discuss its validity further in Appendix A. 
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In the non-conditional approximation, the probability that two individuals both chosen 
from fitness class k coalesce in an arbitrary class k — £ is 

ptj^k-t = J Q k-i^ ^h-'W e-tk-Wi^dt^dx, (29) 

where in our approximation Qk~k(ti^2) is the unconditional distribution of the times at 
which the two individuals sampled in class k originally moved from class k — £ to class 
k — £ + 1 by acquiring a deleterious mutation. Since t\ and i 2 are independent in the non- 
conditional approximation, we have Q k ^{t\^t<2) = Qk~ i (ti)Qk~ i (^'2)- We calculate these 
distributions of mutant timings Ql~ £ (t) in Appendix B. Plugging these in, and evaluating 
the integrals as described in Appendix C, we find 

pk,k^k-e _ 1 V) (on) 

l + 2Nh k . e8 (k-i)(^y { ) 

wh ere ft) = why. 

This is our final result for the coalescence probability in class k — £ of two individuals 
chosen from the same class k. Note that the dependence on the parameters of the evolu- 
tionary process is entirely contained in the factor 1+2 Nhk is(fc-i) ' Thus ^ e result Eq. (30) is 
simply 

r>k,k^k—£ \ Ak /q-i \ 

n ~ i + 2Nh k . e s(k-e) £l [6 ' 

where A\ is a numerical coefficient which depends on k and £ but not on the population 
parameters. 

This general form for the coalescence probabilities makes intuitive sense. Nh k _£ is the 
population size of class k — I, and s ^_^ is the average number of generations that an 
individual spends in class k — £ before mutating away. Since the per-generation coalescent 
probability in a population of size n is proportional to ^ , it makes sense that the coalescent 
probability in class k — £ is approximately proportional to one over the population size of 
this class times the number of generations individuals spend in this class. The additional 1 
in the denominator captures the fact that the individuals might mutate away from the class 
before coalescing there (which reduces the average time they spend in the class together). 
The numerical factor multiplying this basic scaling, A\ comes from the integrals over the 
probability distribution of mutant timings (i.e. the dt\ and dt2 integrals). It reflects the 
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probability that the ancestors of the two individuals we are considering were both in class 
k — £ at the same time, since they could not otherwise coalesce there. The factor of 2s (k-£) 
is the average amount of time that the two individuals spend together in class k — £ given 
that they are ever in that class at the same time. 

From this result, we can also form an intuitive picture of the shape of genealogies in the 
presence of negative selection. We have just seen that coalescence probability per actual 
generation depends on the parameters as N * , where the relevant value of £ increases as 
we go back in time. Thus the structure of genealogies in the presence of negative selection 
is similar to having a variable population size as we go back in time. The precise nature of 
this variable population size is encoded in the fitness distribution hk-£. For example, if we 
imagine sampling two individuals from the same below- average fitness class, the probability 
distribution of their genealogies is like having a population size that initially increases and 
then decreases as we look backwards in time. Of course, this analogy only goes so far. Most 
importantly, the coalescent steptimes are related to the statistics describing genetic diversity 
in a different way from how normal coalescent times are usually related to these statistics. 
Further, in general we will not happen to sample two individuals in the same fitness class, 
a complication we now turn to. 

General coalescence probabilities in the non-conditional approximation: Thus 
far we have focused on the coalescence probabilities starting from a sample of two individuals 
from the same fitness class k. However, when we sample two individuals from the population 
at random, it is likely that they come from different fitness classes. In general, the probability 
that two individuals sampled at random from the population come from classes k and k' 
respectively is if (fc, fc') 3 as defined in Eq. (18). 

Given that we sample two individuals from classes k and k' ) where by convention we 
choose k! > k, the coalescence probability in the non-conditional approximation is 



pk^u-t = f g^(t 1 )g^(t 2 )-^/ fc _,(x)e- s ^)i* 1 -* 2 iM 1 ^ 2 . (32) 

J ill. D 



c 2 

I 



We substitute in our expressions for Q k (t) and evaluate the integrals in Appendix C; we 
find 

r>k,k'^>k-£ _ \ j\k,k' /qq\ 

^ ~ l + 2Nh k . e s(k-£) " ' 1 j 
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where 

W = (34) 

\2l+k'-k) 

Eq. (33) is the complete solution for coalescent probabilities in the non-conditional 
approximation. As in the previous subsection, the parameter dependence is simple and the 
probability of coalescence in a given fitness class is proportional to the inverse population 
size of the fitness class and the time an average individual spends in that fitness class. This 
is multiplied by a fc, k' ) and ^-dependent numerical factor which decreases as /c' — /c increases, 
reflecting the fact that the larger fc 7 — A; is, the less likely the ancestors of the two sampled 
individuals are to have been in a given fitness class at the same time. The dependence 
of Af k on t is more complex, but reflects the probability that the ancestors of the two 
individuals we are considering were in class k — £ at the same time. 

In Fig. 3 we show examples of coalescence probabilities calculated from our theoretical 
framework within the non-conditional approximation for different population parameters. 
We see that the probability of coalescence steadily increases for longer steptimes (classes 
with larger fitness), and decreases with increasing selection coefficients and population size. 

A SUM OF ANCESTRAL PATHS APPROACH 

Our analysis thus far has focused on using the lineage structure within each fitness class to 
determine the coalescence probabilities. Hudson and Kaplan (1994) proposed a somewhat 
different way to look at the same problem: they considered a sample of individuals and, 
without explicitly describing lineage structure, computed the relative probabilities that the 
next event to occur backwards in time would involve a mutation or coalescent event. For 
example, if two individuals are in the same fitness class, the next event could be either 
coalescence within that class or a mutation event. The rates at which these events occur 
determines their relative probabilities. In this manner, Hudson and Kaplan (1994) were 
able to generate a recursion relation for the mean time to a common ancestor, their Eq. (12). 
Gordo et al (2002) used this equation as the basis for a coalescent simulation. A similar 
logic was used earlier in by Kaplan et al. (1988) to develop analogous diffusion equations 
for the transition probabilities between states; Barton and Etheridge (2004) developed 
this approach to compute the effect of selection on genealogies in a two-locus system. 
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Recursion relations of the Hudson and Kaplan (1994) form can be solved numerically, 
and have been used to generate data describing coalescent statistics, but have not yet led to 
an analytic description of the structure of genealogies in the presence of negative selection 
at many linked sites. We now demonstrate that these numerical methods are equivalent 
to our lineage-based formalism above, by showing that the Hudson and Kaplan (1994) 
approach can be used to derive identical analytical formulas for the coalescent probabilities. 
We refer to this as a "sum of ancestral paths" approach, because it relies on summing over 
all possible paths of individual ancestry through the fitness distribution. The equivalence of 
this approach to our lineage-structure calculations means that our analytical results in this 
paper match earlier numerical and simulation results based on the Hudson and Kaplan 
(1994) formulation. 

In order to calculate the coalescence probabilities for a sample of two individuals, we 
consider the set of all possible ancestral paths these individuals may have followed. Each 
path is represented by an ordered set of events, backwards in time. These events may either 
be deleterious mutation events, which move one of the ancestral lineages to the previous 
fitness class, or coalescence events, which merge the two ancestral lineages. In the absence 
of back mutations, the ancestral lineages may only move toward higher-fitness classes, such 
that movement through the distribution is irreversible. As a consequence, in order for 
two individuals to coalesce in class k — I, each ancestral lineage must undergo a series of 
deleterious mutation events, bringing them from their initial classes to class k — £. The 
lineages must then coalesce before any additional deleterious mutations occur. 

For example, in order for two individuals sampled from class k to coalesce in class k — 1, 
the first event, backwards in time, must be a deleterious mutation. This mutation can occur 
in either individual. After this event, one of the ancestral lineages is still in class fc, while 
the other is in class k — 1. The second event, backwards in time, must be a deleterious 
mutation event in the ancestral lineage that remains in class k. Both ancestral lineages are 
now in class k — 1. Finally, the third event must be a coalescent event. Note that there are 
a total of two paths, since either individual may have been the first to mutate. 

In general, in order for two individuals sampled from classes k 1 and k to coalesce in class 
k — I, the first k! — k + 2£ events must consist of k! — k + £ deleterious mutation events in 
the ancestral lineage that began in k' \ and £ deleterious mutation events in the ancestral 
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lineage that began in k. The final event must then be a coalescent event. Note that there 
are a total of ~^ +2 ^j possible paths, reflecting the number of ways to order the mutation 
events in one lineage with those in the other. To calculate the coalescence probability, we 
sum the probabilities of each path that results in this particular coalescence event. 

The probabilities of each event: The probability of each path is the product of 
the probability of each event in the path. In order to determine the probability of each 
event, we first consider the rates. As in our lineage structure approach, we neglect neutral 
mutations for now; we will consider their effects in a later section below. We saw above 
that the distribution of times since a deleterious mutation occurred in an individual in class 
k is Qk-i(t) = ske~ skt . If the two individuals are in different classes, they are not able to 
coalesce. Therefore, the probability of each event is simply: 

P(lst Event is Del. Mut. in k|fc, k f ) = sk (35) 

Srv ~\~ Srv 

sk' 

P(lst Event is Del. Mut. in k'|fc, k f ) = — -. (36) 

Srv ~\~ Srv 

If the two individuals are in the same class, the next event may either be a coalescent event 
or a deleterious mutation. Within each class, coalescence is a neutral process that occurs 
with rate 1/Nh k . Therefore, we have that: 

P(lst Event is CoaL|fc, k) = V^f/L, n = ~, ^ 7 7 = ^ (37) 

v 1 ; sk + sk + l/(Nh k ) l + 2Nh k sk x v ; 

P(lst Event is Del. Mut.|fc, k) = — — y 2 ^ //7vr , , = - 2 ^rf i = 1 " ^ ( 38 ) 
v 17 sk + sk + l/(Nh k ) l + 2Nh k sk x v 7 

These probabilities are analogous to those used by Gordo et al. (2002), and similar ex- 
pressions can be derived as a simple extension of the analysis of Barton and Etheridge 
(2004). 

The sum over possible ancestral paths: Using these probabilities, we now calculate 
the probability of coalescence in a given class. First, consider sampling two individuals from 
the same fitness class k. In order for these two individuals to coalesce in class fc, the first 
event must be a coalescent event. Thus we have: 

equivalent to our earlier lineage-based result. In order for these individuals to coalesce in 
class k — 1, the first event must be a deleterious mutation event. Since both individuals' 
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ancestral lineages are currently in class k ) the probability the first event is a deleterious 
mutation event is 1 — 1%. After this event, there is now one ancestral lineage in class k — 1, 
and one in class k. The next event must be a deleterious mutation in the latter, which occurs 
with probability ^rzi- Finally, the third event must be a coalescent event. This implies 

^l(l) = (l-/M- 1 2jfc3i- ( 4 °) 

Note that this logic has given us an expression for the probability that the coalescent steptime 
is 1, 01(1), and not the probability of coalescence in this class given that coalescence has not 
yet occurred, p^ k ^ k ~ £ ^ because we have already included the probability that the coalescence 
event does not happen in class £. 

We can continue to extend this logic to subsequent fitness classes. For example, for 
coalescence to occur in class k — 2, there are six possible paths. We can label them as AABBc, 
BBAAc, ABABc, ABBAc, BABAc, and BAABc, where A corresponds to a mutation in the 
first individuals' ancestral lineage, B corresponds to a mutation in the second individuals' 
ancestral lineage, and c corresponds to a coalescent event. We can calculate the probability 
of each path. For example, 

The probability of path BBAAc is identical, since it has the same probabilities at each 
step. However, the remaining four paths have a different probability, because the ancestral 
lineages exist together in the k — 1 class at the same time. This distorts the probability of 
mutations at that step, since coalescence could also have occurred. For paths of this type, 
we have 

^-M^X^X^X^k 2 ™ 

We add up each path to find 

«*< 2 > = I "- \ i 2k k -m k -3) ( 2 1 1 - ^ + 4 ( x - 7 *) ( x - 7 *~ 1 )) <*» 

k _ 2 3k(k -I) f 1 _ I k_ 2 7 *_i + 2 jfc^-A (44) 



x 2(2k- l)(2fc-3) V x 3 x 3 x x / 

It is informative to consider the form of this result. The l^T 2 factor is the probability that 
the two ancestral lineages coalesce in class k — 2, given that they existed in class k — 2 at the 
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same time. The remaining factors represent the probability that the two ancestral lineages 
existed at the same time in class k — 2. This consists of a leading order term ^ 2 k-i){2k-?>) 
(identical to our earlier result for A\ =2 ) ) multiplied by a correction due to the distortion in 
paths from the possibility of coalescence in previous steps. 

We can continue on to consider the probability of coalescence in class k — 3. There are now 
a total of (^j possible paths. These can be split into four types, depending upon whether 
the two ancestral lineages coexisted in both classes k — 1 and k — 2 (e.g. ABABABc), in 
class k — 1 only (e.g. ABAABBc), in class k — 2 only (e.g. AABBABc), or in neither 
(e.g. AAABBBc). The probability of each type of path is identical, except for a distortion 
factor (1 — for each class k — i in which the two ancestral lineages were together at 

the same time. The probabilities can be calculated as before, and summed to yield 0^(3). 
Using similar logic, we can extend this approach to the situation where two individuals are 
sampled from different classes, k 1 and k. 

In Appendix D, we describe the details of carrying out this summation over all possible 
paths to determine the coalescent probabilities. We find 



\k'-k+2£) 



£ _! fk'-k+2i\ (2t-2i\ 

(k'-k+2£\ 2 x ! 
*=0 I £ ) 
2i-\ (k'-k+2i\ (2j-2i\ (2£-2j\ 
^ V i J \ 3-i J V 1-3 J jk-ij 



EE 

i=o j>i \ £ ) 



(k'-k+2£^ 



(46) 



where as always we have assumed k < k! by convention. The form of this solution is 
intuitive. The factor is the probability of coalescence in class k — t, given that the two 
ancestral lineages existed in this class at the same time. The remaining factors reflect the 
probability that the two lineages are together in class k — £ at some point. This consists of a 
leading order term, which is identical to the A*' k calculated previously, times a correction. 
The correction represents the distortion in the paths due to the possibility that coalescence 
could have occurred at previous steps. There are a total of / + 1 terms in the correction, 
each of which is known and calculable. 

Fortunately, provided 2Nh k sk 3> 1, we can neglect the higher-order terms in Eq. (46). 
This is equivalent to calculating the probability of coalescence in a given class, without con- 
sidering the possibility that coalescence events could have occurred in previous classes. Thus 
it converts our expression for (p^ (£) into an expression for pM'^-^ Neglecting these terms 
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also implicitly makes the non-conditional approximation, as we did in the PRF method, 
because it assumes that the fact that coalescence did not occur in previous classes does not 
distort the likelihood of taking particular paths. Making this approximation, we find 

pk,k'^k-i _ \ /\k,k' (A7\ 

" l + 2Nh k . £ s(k-£) £ ' 1 ; 

which exactly matches our expression for the coalescence probabilities in the non-conditional 
approximation in our PRF approach, Eq. (33). 

The condition 2Nhksk 3> 1 is the condition we are already assuming in treating the 
frequencies of each class, hk as constant. Thus the results from the PRF method and 
the sum of ancestral paths are exactly equivalent in the regime where they are valid. We 
discuss the correspondence between approximations in the sum of ancestral paths method 
as compared to the PRF method in more detail in Appendix D. 

THE STRUCTURE OF GENEALOGIES AND STATISTICS OF 

GENETIC DIVERSITY 

We can now use the coalescence probabilities described above to calculate the structure 
of genealogies in the presence of negative selection. We can then use these genealogies to 
calculate various statistics describing the genetic diversity within the population. We know 
the coalescent probabilities in each step of our fitness-class coalescent process, so in principle 
we can calculate the probability of any genealogy relating an arbitrary number of individuals 
using methods analogous to those used in standard neutral coalescent theory. This would 
then allow us to calculate the distribution of any statistic describing the genetic diversity 
among these individuals, again using methods analogous to neutral coalescent theory. 

Here we will focus on the simplest genealogical relationship: the distribution of the 
time to the most recent common ancestor of two individuals, which demonstrates the main 
ideas in the simplest context. This allows us to calculate the distribution of the per-site 
heterozygosity tt. This is the only statistic relevant to a sample of two individuals. In larger 
samples, provided the total number of individuals sampled is not too large, the coalescent 
probabilities between any pair of sampled individuals are independent to those between any 
other pair. Thus the distribution of per-site heterozygosity tt we expect in such a sample is 
equivalent to the distribution of tt we calculate here. 
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In our fitness-class coalescent framework, it is natural to consider diversity at the nega- 
tively selected sites separately from diversity at linked neutral sites. We focus first on the 
distribution of coalescent steptimes and 71^, the per-site heterozygosity at negatively selected 
sites alone, ignoring neutral mutations. We will then turn to the connection between step- 
times and actual times in generations, which will enable us to calculate the distribution of 
neutral diversity, including the per-site heterozygosity at neutral sites 7r n . In analyzing data, 
we will of course typically not know a priori which sites are neutral and which are negatively 
selected. In such a situation, we merely add up the expected diversity at neutral sites and 
negatively selected sites, so that the total expected per-site heterozygosity is tt — TVd + 7v n . 

Distribution of steptimes and 71^: We begin by imagining that we sample two indi- 
viduals at random from the same fitness class k. By construction, the number of negatively 
selected sites at which they will be polymorphic is twice their coalescent steptime, 71^ = 2£. 
We therefore have 

p(n d = 2l)=<f> k k (l), (48) 

where p(iid = 2£) is the probability 71^ = 2£. 

More generally, if two individuals sampled from classes k and k 1 coalesce in class k — I, 
we have 7^ = 2£ + k! — k. This means we have 

p(7r d = 2£ + k f - k\k, k') = $(£). (49) 

We can average this over the distributions of k and k 1 to find the distribution of 7^ amongst 
individuals sampled at random from the population. We find 

00 

PM = E E H ( k > k' = k + n d - 2£)(f> k k '= kwe (i), (50) 

t k=0 

where the first sum runs from £ = to the largest integer less than or equal to the smaller 
of k or 7Td/2. Note that in practice we only have to evaluate the sum over k from to a 
multiple of Ud/s, since H{k, k') will be negligible for larger k. 

These results for the distributions of genealogy lengths and of 71^ involve several sums. 
However, all the terms in these sums are straightforward and the numerical evaluations of 
their values are simple and fast. In Fig. 4 we show a representative example of the predicted 
distribution of the per-site heterozygosity at negatively selected sites, p^), compared to 
simulation results. We explore the significance of the shape of the distribution p(7r^), how 
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this distribution depends on the parameter values, and the source of the small but systematic 
deviations between the theoretical predictions and the simulation results in the Discussion. 

The relationship between steptimes and time in generations: So far we have 
focused on the genealogies measured in steptimes, which allowed us to calculate the dis- 
tribution of heterozygosity among negatively selected sites. We would now like to relate 
the steptimes to actual times in generations. To do this, we consider the probability that 
a coalescence event occurred at time i, given two individuals sampled from classes k and 
k' that coalesced in class k — I, ip(t\k, k f ,£). This can be divided into two parts: the time 
since the ancestors of these two individuals were both in class k — £ together, which has a 
distribution ^i(t|fc, k f , £), and the time to coalescence once in this class, i/j2(t\k,k',t). 

We compute these distributions in Appendix E, and find 

A(t\k,k',£) = S7v d e-< k ' + ^(e st - ir d_1 (^ *), (51) 
where we have made use of the fact that 7r d = k! — k + 2£, and 

MtW, k, I) = (2s(k -£) + j^—^j e-W^+rosh* (52) 
The total real time since coalescence is the sum of these two times, so we have 

ip(t\k', fc, £) = fc, t) ★ MW, fc, t). (53) 

We compute this convolution in Appendix E, and find 

WW, K l) = £ ^(-1)— (*< ~ { h ' + U \ -JL- ( e -« _ e -sA^ (54) 



i=0 



where we have defined A = k f + k — i and B = k — £ + — . 

Note that, making the usual approximation Nhk-£s(k — £) ^> 1, this expression can be 
simplified; we find 

^{t\k\ K i) = s(*d + l)e-< k '^\e st - ( k ' \ k \ . (55) 

\7r d + lJ 

However, it is important to note that while this approximation may be valid in the bulk of the 
distribution, it will always fail when coalescence occurs in the zero-class, where s(k — £) = 0. 
In this case, we must use the more complex expression Eq. (54) or (in the case when the 
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coalescence time within the 0-class can be neglected compared to the time taken to descend 
from the 0-class) the expression Eq. (51). 

Averaging over the possible values of k ) k' ) and t, we find the overall distribution of actual 
coalescent time between two randomly chosen individuals, 

oo oo k 

^(*) = E E EV^lM^)^ m W#(M + ™), (56) 

k=0 m=0 ^=0 

where the distributions H(k, fc + m), 0^ +m (^), and fc',^) are as given above. However, 

as we will see below, in calculating neutral diversity we will typically find it easier to work 
directly with ^(£|fc, k', £) rather than this unconditional distribution for i/j(t). 

The neutral heterozygosity 7r n : From the distributions of real times to a common 
ancestor described above, we can calculate the distribution of 7r n , the neutral heterozygosity. 
Since the neutral mutations occur as a Poisson process with rate U n , and there are a total 
of 2t generations in which these mutations can occur, 7r n follows a Poisson distribution with 
mean U n t, where t is drawn from the distribution of coalescence times, Eq. (56). We have 

PM = r ^e- %< #)*- (57) 

JO 7T n \ 

In Fig. 5, we compare this distribution of neutral heterozygosity (as modified by the correc- 
tions described in Appendix A) to direct simulations. We find good general agreement to 
the shape of the distribution, though there are slight systematic errors (presumably due to 
effects of Muller's ratchet, which we explore further in the Discussion). Note that, like our 
results for the diversity at negatively selected sites, these results differ dramatically from the 
exponential distribution a neutral model or effective population size approximation would 
predict; we describe these comparisons further in the Discussion. 

We note that to calculate the distribution of total heterozygosity n = 7r n + 7^, we must 
account for the fact that 7^ and 7r n are not independent: large 7r^ means a large coalescent 
steptime and hence makes a large 7r n more likely. The distribution of 7^ is independent of 
7r n , and is given by p(itd) above. Above we found i/j(t\k, h',t), which implies that 

p(7r n \k,k f J) = ^ 2Unt } - e- 2U ^(t\k,k f £)dt. (58) 

JO 7T n \ 

We can compute this integral; we find 

(59)' 
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where we have defind 



A = k' + k - i 



B = 2k-2£ + 



1 



(60) 



Nsh k _i 



Since 7id = 2£ + k — k', this implies 



p(7r n \k,k',£). 



(61) 



■K d =2i+k-k' 



The distribution of n is then given by 



p(7r) 



(62) 



7T, 



n 



+7T d =7T 



This is no more difficult to calculate than p(7r n ), since it involves analogous sums. However, 
while the distribution of tt is clearly important in analyzing sequence data, in this paper we 
focus on the distributions of 7T n and 7^ separately, which provides a more complete picture 
of the source of all aspects of the genetic variation. 

The mean pairwise heterozygosity: Above we have calculated the distribution of 
heterozygosity for both neutral and deleterious mutations. It is straightforward to average 
these results to calculate the mean pairwise heterozygosity for both neutral and deleteri- 
ous mutations. In Fig. 6 and Fig. 7 we show how this mean heterozygosity depends on 
population size, mutation rate, and selection strength, for neutral and deleterious mutations 
respectively. We see that the dependence of (71^) on the population size is fairly weak. While 
it increases roughly linearly with TV in the weak selection regime, this quickly saturates and 
for Ns substantially greater than 1 the mean heterozygosity becomes almost independent of 
population size. The dependence on Ud/s, by contrast, is much stronger. The dependence 
of (7r n ) on the parameters is also interesting: this depends weakly on the parameters for 
small TV or Ud/s, but for larger TV becomes roughly linear. These results make intuitive 
sense, particularly in light of the "mutation-time" approximation that we introduce in the 
Discussion, where we discuss these figures in more detail. 

Neutral heterozygosity from a sum of ancestral paths: An alternative way to 
compute neutral heterozygosity is to further extend the sum of ancestral paths approach 
which we used above to provide an alternative derivation of the coalescence probabilities. In 
this formulation, we do not make any connection to real times. This means we cannot use 
it directly to calculate the distributions of the times to most recent common ancestors of a 
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sample. However, this approach does provide an alternative way to compute the distribution 
of neutral heterozygosity, p(7v n ). We carry out this computation in Appendix G, and show 
that it leads to results identical to our analysis above. 

Statistics in larger samples: The distributions of 7r n and 7r^ described above are very 
different from the distributions of heterozygosity expected in the absence of selection. We 
could certainly measure the distribution of pairwise heterozygosity from a sample of many 
individuals from a population, and use this to infer the action of selection. However, it 
may also be useful to understand the expected distribution of other statistics describing the 
variation in larger samples. The relationship between these different statistics will typically 
be different than expected in the neutral case, making them useful in constructing other 
statistical tests for selection. 

One statistic often used to describe variation in larger samples is the total number of 
segregating sites among a sample of n individuals, S n . Here we describe how our framework 
allows us to calculate the distribution of £ 3 ; similar methods can be used to calculate the 
distribution of S n for larger n. One common test for neutrality, Tajima's D ) is based 
on a comparison between the observed values of tt and S n ] our results for S3 could in 
principle be used to show how this statistic should be expected to behave in the presence 
of purifying selection. As we will see, it is unwieldy to calculate closed form expressions for 
these quantities in our framework, so here we merely lay out a prescription for calculating 

s 3 . 

We first consider the distribution of S% , the number of segregating negatively selected sites 
among three randomly sampled individuals. In order to calculate the probability a sample 
has a particular £3, we imagine picking three individuals at random from the population 
and calculate the probability of the coalescence events that lead to that S%. We illustrate 
such a situation where three individuals are sampled from classes k, k 1 and k" in Fig. 8. 
Two of these three lineages coalesced in class k\. We call the steptime at which two of the 
three lineages coalesced t 3 (see Fig. 8). We next need to calculate the distribution of t 2 , the 
total steptime to common ancestry of the three individuals. This time of course cannot be 
smaller than t 3 . Given values of t 3 and t 2 , it is clear from Fig. 8 that the total number of 
segregating negatively selected sites is £ 3 = 2t 2 + r 3 . 

Calculating the joint distribution of t 2 and r 3 is tedious, because we must sum over all 
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possible orderings of the coalescence events, but it can be computed using either our lineage 
structure method or the sum of ancestral paths approach. The basic result is analogous to our 
results for the coalescence steptime between a pair of individuals: coalescence probabilities 
within a given class are proportional to the inverse size of that class times the number of real 
generations the ancestors of given individuals typically spend in that class, times a factor 
that reflects the time that the ancestors of sampled individuals are present in each class at 
the same time. 

The number of segregating sites Sf is given by 

= r 3 + 2r 2 - (fc" -k)- (k" - k'). (63) 

Thus using the distributions of r 3 and r 2 , and averaging over the distributions of fc, fc', and 
k h \ we can calculate the full distribution of S|. Given a particular value of S3, there is 
a relationship between the steptimes and actual times (analogous to Eq. (54)), which we 
could use to find the distribution of the total number of segregating neutral sites S3 . More 
complex statistics involving even larger samples can be computed using similar methods. 

However, while this analysis provides a prescription for calculating the distribution of S3 
and S3 , it is clear that the full distributions are opaque. In the Discussion we provide a 
simple approximation for S n in a specific parameter regime we refer to as the "mutation- 
time" regime, but the complexities of the general calculation are tangential to the ideas 
behind our framework, so we do not pursue them further here. However, these issues will 
be important to explore in future work aiming to use this framework for data analysis, and 
our approach here can be used as the basis for genealogical simulations. Further, since our 
methods allow us to quickly compute the probability of a given genealogical history and to 
draw a particular genealogy from the appropriate distribution, they may provide a useful 
basis for importance sampling or MCMC methods to infer selection pressures from data. 

NUMERICAL SIMULATIONS OF THE GENETIC DIVERSITY 

We compare the predictions of our fitness-class coalescence analysis to Monte Carlo simula- 
tions of the Wright-Fisher model. In our simulations, we consider a population of constant 
size TV and we keep track of the frequencies of all genotypes over successive, discrete gen- 
erations. In each generation, TV individuals are sampled with replacement from the preced- 

30 



ing generation, according to the standard Wright-Fisher multinomial sampling procedure 
(Ewens, 2004) in which the chance of sampling an individual is determined by its fitness 
relative to the population mean fitness. 

In our simulations, each genotype is characterized by the set of sites at which it harbors 
deleterious mutations and the set of sites at which it harbors neutral mutations. In each 
generation, a Poisson number of deleterious mutations are introduced, with mean NUd, and 
a Poisson number of neutral mutations are introduced, with mean NU n \ each new mutation 
is ascribed to a novel site, indexed by a random number. The mutations are distributed 
randomly and independently among the individuals in the population (so that a single 
individual might receive multiple mutations in a given generation). The simulations record 
the time (in generations) at which each distinct genotype was first introduced. 

Starting from a monomorphic population, all simulations were run for at least - g ln(Ud/s) 
or TV generations (whichever was larger), to ensure relaxation both to the steady-state 
mutation-selection equilibrium and to the PRF equilibrium of allelic frequencies within each 
fitness class. The final state of the population — i.e. the frequencies of all surviving geno- 
types — was recorded at the last generation. In order to produce the empirical distributions 
of 7id, and 7T n shown in Fig. 4 and Fig. 5, we averaged across at least 300 independent 
populations for each parameter set. 

Our simulations allow for random fluctuations in the frequencies of each fitness class, and 
for Muller's ratchet. In most of the parameter regimes we explored, the ratchet proceeded 
during the simulation, so that the least loaded class at the end of each simulation typically 
contained anywhere from no deleterious mutations (typical for Ud/s = 2) to more than a 
dozen (typical for Ud/s = 4). We see that despite these effects, our theory agrees well with 
the simulations, although there are small systematic errors that are signatures of the effects 
of the ratchet. Generally speaking these errors increase as we increase Ud/s, but become 
less severe for larger TV or s. We consider these effects of Muller's ratchet in more detail in 
the Discussion. 



DISCUSSION 



In recent years, both experimental studies and sequence data have pointed to the general 
importance of selective forces among many linked variants in microbial and viral populations, 
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and on short distance scales in the genomes of sexual organisms (Hahn, 2008). Our analysis 
provides a framework for understanding how one particular type of selection — pervasive 
purifying (i.e. negative) selection against deleterious mutations - - affects the structure 
of genetic variation at the negatively selected sites themselves and at linked neutral loci. 
This type of selection is presumably widespread in many populations, in which there is a 
selective pressure to maintain existing genotypes and mutations away from these genotypes 
at a variety of loci are deleterious. 

A variety of earlier work has addressed aspects of this problem, as described in the In- 
troduction. The key insight of our approach is that instead of following the true ancestral 
process, we develop a fitness- class genealogical approach which focuses on how individuals 
"move" through the fitness distribution. Here each mutation plays the role of a reproduc- 
tive event that moves individuals through the fitness distribution, and each fitness class is 
a "generation" in which coalescence can occur with some probability. We calculate this 
probability using a simple approximation based on the PRF model of Sawyer and Hartl 
(1992), rather than by considering the actual reproductive process within that class. By 
extending formulas originally computed by Hudson and Kaplan (1988) and Barton and 
Etheridge (2004), we showed that these coalescent probabilities can also be computed us- 
ing a summation of ancestral paths based on the structured coalescent described by Hudson 
and Kaplan (1994). Hence the conclusions from our analysis also describe the simulations 
of Gordo et al (2002) and are consistent with all other results based on this structured coa- 
lescent approach. Our work is also closely related to recent work in continuous-fitness model 
by O 'Fallon et al. (2010), which uses a similar framework to analyze the weak-selection 
regime but not the Ns 3> 1 situation we study here. 

Our approach leads to simple expressions for the coalescent probability at each step in 
our fitness-class genealogical process. This makes it a complete effective coalescent theory: 
using these probabilities, we can calculate the probability that a sample of individuals has 
any particular ancestral relationship. Our coalescent probabilities are different from those 
in the standard Kingman coalescent (Kingman, 1982), so the structure of genealogies has 
a different form. 

Of course, since our process is an effective rather than an actual coalescent, the rela- 
tionship between a fitness-class genealogy and the expected statistics of genetic variation 
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given that genealogy is different than in the standard neutral coalescent. Given a particular 
genealogy measured in steptimes, the numbers of deleterious mutations are the coalescent 
times, and to calculate the statistics of neutral variation we have to make use of the rela- 
tionship between steptimes and actual coalescence times. This contrasts with the Kingman 
coalescent, where numbers of neutral mutations are typically Poisson-distributed variables 
with means proportional to coalescence times (Wakeley, 2009). However, we can account 
for these differences by starting with the distribution of fitness-class genealogies and then 
converting these genealogies into actual coalescence times. 

In this paper, we have used this fitness-class approach to calculate simple statistics de- 
scribing genetic variation, in particular the distribution of pairwise heterozygosity. This 
leads to analytic expressions for the quantities of interest, although these expressions in- 
volve sums which are most easily calculated numerically. These are easy to compute, and do 
not become harder to evaluate in larger populations, and hence are more efficient to evaluate 
than either simulations or calculations within the ancestral selection graph. 

An Intuitive Picture of the Structure of Genealogies: The most important aspect 
of our analysis is not the specific results for heterozygosity, which match the conclusions 
of earlier simulations. Rather, the fitness-class coalescent approach allows us to draw sev- 
eral important general conclusions about how negative selection distorts the structure of 
genealogies. For two individuals drawn from particular fitness classes, the effect of negative 
selection is similar to that of an effective population size that changes as time recedes into 
the past, as has been suggested by earlier work. However, this is not a population size that 
decreases in a simple way into the past. Our analysis shows the exact form of this time de- 
pendent population size. Further, it is clear from our analysis that this is not the only effect 
of negative selection on genealogies. There are two key complications. First, the statistics of 
genetic variation (particularly at the deleterious sites themselves) depend on the structure of 
genealogies differently in our fitness-class coalescent than in the standard neutral coalescent. 
Second, different pairs of individuals have a different time-varying effective population size. 
This means that genetic diversity cannot be represented by a single time-varying effective 
N e (t) for the whole population, which means that it may be possible to develop statistical 
tests to distinguish negative selection from population size. All of these general intuitive 
conclusions about the structure of genealogies in our fitness-class coalescent are illustrated 
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in Fig. 9. 

We now pause to make this intuitive picture of the shape of typical genealogies more 
precise. In general the probability that two individuals will coalesce within class k has the 
form P c & ^ ^jfe5 where is the population size of that class, sk is the effective selection 
pressure against individuals within that class, and A is a constant that depends on which 
classes the lineages began in, but not on any of the population parameters. We have seen 
that each lineage spends on average ^ generations in class k. Thus we can think of each 
individual as seeing a historical effective population size as shown in Fig. 9c: it starts in 
some class k with size n& and spends ^ generations in that class before moving to class 
k — 1, and so on. 

If we sample two individuals, however, they will not always be in the same class at the 
same time. This effect reduces the coalescence probabilities in each class, as captured by 
the factor A/2. This factor is the average fraction of the ^ generations each lineage spends 
in class k that the two lineages spend there together. Alternatively, we can think of this 
factor as consisting of two parts: A is the probability that the two lineages are ever in the 
same class at the same time, and ^ is the average amount of time that they coexist in the 
class if they coexist at all (they each spend on average ^ generations there, but on average 
overlap for only half this time if they overlap at all) . While the two lineages are in the class 
at the same time, the per-generation coalescent probability is 

This logic implies that genealogies in the presence of purifying selection look like neutral 
genealogies with a specific type of historical population size dependence. Imagine for example 
we picked two individuals from the same fitness class k. They each spend on average ^ 
generations in class fc, and during that time they have a probability per (real) generation 
of coalescing (this probability includes the fact that on average they are both in the class 
simultaneously for only a fraction of the mean time each spends there). So roughly speaking, 
they have an effective population size of N e ~ 2nk/A^ for the first ^ generations. If they 
fail to coalesce, they then move to class k — 1, where they spend s ^_^ generations and 
have a probability 4—^— per generation of coalescing, and hence an effective population size 

^ n k — l 

N e ~ 2rik-i/A^ 1 for this time. If they again fail to coalesce, they move to class k — 2, and 
so on. 

So far, this picture of a time-dependent population size is rather crude, but we can make it 
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more precise. Specifically, we can write the coalescence probability between two individuals 
sampled from class k and k f as a function of time in generations as 

mk,«) = j2$W^,kr,e). (64) 

£=0 

We can then define the time-dependent effective population size between these individuals, 
N e (t), as the inverse probability of coalescence at time t given that coalescence has not yet 
occurred, 

N e (t) 1- f*ip(t f \k,k f )dt r [ ) 

In other words, the N e (t) is defined as usual as the inverse of the probability that the two 

individuals will coalesce at time t given that they have not yet done so. 

We illustrate this precise time-dependent population size N e (t) in Fig. 9d. We see that 
for two individuals sampled from the same fitness class, N e (t) typically increases into the 
recent past and then decreases into the more distant path. This reflects the fact that the 
two individuals are becoming less likely to be in the same fitness class in the recent past, but 
that as time recedes into the distant past they are likely to be in the highly fit classes which 
have smaller n^. For two individuals sampled from classes near but not identical to each 
other, N e (t) starts high and then drops before exhibiting a pattern similar to that among 
individuals sampled from the same class. This reflects the fact that it takes at least a short 
time before the two individuals have any chance of being in the same class. Finally, for 
two individuals sampled from more distant classes, N e (t) simply declines into the past, both 
because longer ago they were more likely to be in the same class and more likely to be in 
the small classes near the high-fitness tail. 

Averaging over the whole population, Fig. 9d shows the precise time-dependent popu- 
lation size N e (t) for two randomly sampled individuals. This average N e (t) initially stays 
roughly constant as time recedes into the past before decreasing thereafter. For these two 
randomly sampled individuals, selection is indistinguishable from this particular histori- 
cally varying population size (although this particular type of variation in population size 
is presumably rather unusual). The distribution of coalescence times between this pair of 
individuals looks the same as neutral coalescent histories with this specific population size 
history. The deleterious mutation rates and selection pressures only matter in that they 
determine the form of this population size history. 
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However, a key difference from a neutral population of time- varying size is that, as is clear 
in Fig. 9d, pairs of individuals do not typically come from the same fitness class. Rather, 
they come at random from different parts of the fitness distribution, and those that come 
from different places have ancestries characterized by different historically varying population 
sizes. The total distribution of ancestry is the sum of all of these. In other words, the genetic 
variation within the population is like that in a population where some individuals had one 
type of historical population size history, while others had another. If we restrict ourselves 
to pairwise statistics such as 7r, the average N e (t) across pairs of individuals will accurately 
describe the genetic diversity. However, when we consider appropriately defined statistics 
in larger samples, the fact that there is no single N e (t) for the whole population could be 
important. It remains an interesting question for future work to explore how to exploit this 
fact to develop statistical tests to distinguish the effects of purifying selection from that of 
a historically varying effective population size. 

Approximations underlying our approach: Our analysis relies on three key approx- 
imations. First, both our lineage-structure and our sum of ancestral paths methods assume 
that we can neglect fluctuations in the total frequency hk of each class. Related to this 
approximation, we have also implicitly assumed that the probability a lineage in class k 
reaches a frequency close to hk can be neglected. In Desai et al (2010), we analyzed these 
approximations in detail and showed that they will hold in class k whenever Nhksk 3> 1. In 
practice, this condition will often break down in the high and low-fitness tails of the fitness 
distribution. Fortunately, provided it holds in the bulk of the distribution in which most 
individuals will be sampled (which will typically be true provided Ns 3> 1), our approach 
will still be a good approximation. 

Our second key approximation is the non-conditional approximation, which we discuss 
in more detail in Appendix A. This approximation is also made in a more subtle way in the 
summing over ancestral paths method, as described in Appendix D, though we note that in 
computing the distribution of 7r^ it is possible to avoid this approximation in this method. 

Our final and most important approximation is that we assume that Muller's ratchet 
can be neglected. We can think of this as the most extreme aspect of our approximation 
neglecting fluctuations in the sizes of each fitness class. This approximation can sometimes 
be problematic; we discuss it in detail below. 
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Although we have focused primarily on situations when selection is weak compared to 
total deleterious mutation rates, our approach is also valid regardless of whether s is strong or 
weak compared to Ud> However, when selection is sufficiently strong (Ns 3> 1 and Ud/s < 1), 
then an effective population size approximation accurately describes the patterns of genetic 
variation, as we describe below. Thus our methods are primarily useful for situations where 
selection is weak compared to mutation rates. 

Relationship with an effective population size approximation: Charlesworth 
et al (1993) considered how selection against many linked deleterious mutations affects 
linked neutral diversity in a model identical to ours. These authors found that when selection 
is sufficiently strong, the shape of genealogies and hence the statistics of variation at linked 
neutral sites is identical to the neutral case, with a reduced effective population size. We 
refer to this as the effective population size (EPS) approximation. 

The idea behind the EPS approximation is that when selection is strong, deleterious 
mutations are quickly eliminated from the population by selection. Thus if we sample in- 
dividuals from the population, they must have very recently descended from individuals 
within the class of individuals which had no deleterious mutations (the 0-class). The EPS 
approximation assumes that the time for this to happen can be neglected, and that indi- 
viduals never coalesce before it does. These individuals then coalesce within the 0-class as 
a neutral process with effective population size equal to the size of that 0-class, which is 
Ne~ Ud / s . Thus the genetic diversity within the population is identical to that in a neutral 
population of reduced size N e = Ne~ Ud ^ s . 

The EPS approximation is valid provided that the neutral coalescence time within the 
0-class, t neut , is large compared to the time it takes for a typical individual to have descended 
from the 0-class, tdesc We know t neut ~ Ne~ Ud ^ s ) and since a typical individual comes from 
fitness class k ~ Ud/s, we have that tdesc ~ Y^i=\ ~ - In (—\ This means that the EPS 

J JS S \ S J 

approximation will be valid provided 



Because of the exponential term on the left hand side of this expression, it is clear that 
the EPS approximation is a strong-selection, weak- mutation limit. It will tend to be valid 
provided that Ns > 1 and Ud < s, but whenever Ud becomes much larger than s, it will 
typically break down even in enormous populations. 




(66) 
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Our analysis describes the effects of background selection beyond the EPS approximation. 
We do not assume that the coalescence time through the fitness distribution is small com- 
pared to the coalescence times within the 0-class, or that coalescence cannot occur among 
individuals carrying deleterious mutations. It is precisely these two effects that lead to dis- 
tortions away from the neutral expectations, making it impossible to describe genealogies 
using neutral theory with a revised effective population size. Although our analysis is a 
generalization of the EPS approximation, it is not inconsistent with it. However, we have 
focused primarily on situations where the EPS approximation breaks down, and coalescence 
times through the fitness distribution are large compared to those in the 0-class, because 
this is the situation where our approach is most useful. 

Note also that in many situations it may be the case that there are many linked weakly 
selected mutations and many linked strongly selected mutations. In such circumstances, the 
process we consider and the EPS approximation can act simultaneously, each for different 
classes of mutations. Imagine we had one class of mutations with fitness cost si which 
occur with mutation rate C7i, where U\ < si and Nsi 3> 1 so that the EPS approximation 
applies. At the same time, imagine another class of mutations with fitness cost s 2 which 
occur with mutation rate £7 2 , where £7 2 3> ^2 so that the EPS approximation breaks down 
for these mutations. In this case, the genetic diversity we expect to see will be characteristic 
of our fitness-class coalescent theory (with Ud = U2 and s — S2), but with a reduced 
effective population size N e = Ne~ Ul ^ Sl . In other words, the strongly selected mutations 
reduce the effective population size because all individuals are very recently descended from 
an individual that had no large-effect mutations, but the coalescence time through the 
distribution of weakly selected mutations cannot be neglected. 

A "Mutation-time" Approximation: We have seen that our analysis accounts for 
two effects missing from the EPS approximation: coalescence events outside the 0-class, 
and the time it takes for individuals to have descended from the 0-class. Whenever Ud/s 
and N are both sufficiently large, the former effect can be neglected while the latter is 
still important, because the number of lineages in each fitness class becomes large and 
hence coalescence events are very unlikely to occur outside of the 0-class. This leads to 
an approximation which we can think of as a generalization of the EPS approximation. 
Rather than considering primarily the diversity generated within the most-fit background, 
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we focus instead on the diversity that accumulates while lineages move between different 
less-fit backgrounds. Hence we term this approach a "mutation-time approximation" (MTA) 
for short. In this approximation, we assume that all individuals coalesce within the 0-class, 
as with the EPS approximation. However, unlike the EPS approximation, we consider the 
time it took for individuals to descend from the 0-class in addition to the coalescence time 
within the 0-class. This approximation is valid for large TV (when even Nh\ is enormous 
compared to ^) so that coalescence always occurs in the 0-class. 

In this mutation-time approximation our results become much simpler and provide a 
useful intuitive picture of the structure of genealogies and genetic variation. Consider the 
deleterious heterozygosity TTd of two individuals sampled from fitness classes k and k' . In 
this approximation, these two individuals always coalesce in the 0-class so we always have 

= k + k f . Since two individuals are sampled from classes k and k 1 with probability 
H{k ) A 7 ), the distribution of TTd in the population as a whole is extremely simple: we have 



This simple approximation makes it clear why the distribution of TTd looks the way it does, 
and explains how it varies with Ud/s and with TV, both in this mutation-time approximation 
and more generally. For large TV, when coalescence outside the 0-class can be neglected, two 
individuals from class k and k 1 have TTd = k + k! . Thus the distribution of TTd has roughly 
the same shape as the distribution of fitness within the population. The mean TTd is 2Ud/s, 
since the average individual comes from class k = Ud/s. Smaller and larger TTd are less 
likely; the distribution of fitness in the population has variance equal to the mean, so the 
variance of the distribution of TTd is also roughly equal to its mean. As A gets smaller, there 
is sometimes coalescence outside of the 0-class. This reduces TTd given k and k 1 . Hence as 
we reduce A, the distribution of TTd shifts somewhat leftwards, with a peak somewhat below 
2Ud/s, and has slightly more variance since there is a less definite correspondence between 
fc, A/, and TTd- Since tt u is determined by 71^, this also explains why the distribution of tt u 
has the peaked form we observe, and how it depends on Ud/s and A (note that for tt u the 
coalescence time within the 0-class, which increases linearly with A, must also be included). 
All of these intuitive expectations are reflected in our results, as shown in Fig. 4, Fig. 5, 
Fig. 6, and Fig. 7. Note for example that in Fig. 4, the peak of TTd is slightly below 2Ud/s 
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(reflecting the finite population size) and has variance about equal to its mean; we have 
verified that as TV increases the shape of the distribution remains roughly the same, but the 
mean increases towards 2Ud/s and the variance decreases slightly. 

More complex statistics of sequence variation are similarly straightforward to calculate in 
the mutation-time approximation. When considering larger samples, the genetic diversity 
is determined by the fitness classes these individuals come from, which is always simple 
since the probability a given individual is sampled from fitness class k is just the Poisson- 
distributed h^. This approximation may therefore prove useful in developing simple and 
intuitive expressions for various statistics. For example, we can use this approximation 
to calculate a simple expression for the distribution of the total number of segregating 
negatively selected sites in a sample of size n, which as we have seen above is otherwise 
rather involved. We have 

P(S d n = x)= h kl h k2 ...h kn , (68) 

ki,k 2 ,...k n 

where the sum is over sets of the k { that sum to x. We find 

p(# = aO=e-^/'i(^y\ (69) 

This is a distribution which is peaked around a mean value of ^ , for the same reasons the 
distribution of 7r d looks as it does. We note however that as we increase the sample size n 
the population size TV must be even larger for this MTA approximation to hold. 

We can also calculate the distributions of actual coalescence times and hence the distribu- 
tions of statistics describing neutral diversity in the mutation-time approximation. Consider 
the distribution of the real coalescence time between two individuals chosen from classes k 
and k' . In the mutation-time approximation where the coalescence time within the 0-class 
can be neglected, the actual coalescence time is as given in Eq. (51), 

life, k') = s(k + k')e-< k+k '^ (e st - l)^" 1 . (70) 

Averaging over the values of k and k\ we have 

k' oo 

^) = EE ff (*.*')#.^ ( 71 ) 

k=0 k'=0 
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The distribution of coalescence times once within the 0-class is, as before, ^(t) = Wh^ e ~ t ^ Nh ^ • 
From this distribution of real coalescence times, we can find the distribution of neutral het- 
erozygosity 7r n in the usual way, 



We can immediately see that the average coalescence time in this MTA approximation is 
t & J2o d ^ S j- +Nh Q ^ ^ In (Ud/s) + Nh G . We therefore expect that the neutral heterozygosity 
will on average be 



The first term in this expression comes from the time to descend through the fitness distribu- 
tion, while the second term comes from the time to coalesce within the 0-class. If this latter 
term is large compared to the former, the EPS approximation applies. In the opposite case 
where the time to descend through the distribution dominates, we can see from the MTA 
approximation that, as with 71^, the shape of this distribution of 7r n is primarily determined 
by the shape of if (fc, k f ). In this case, we find that the peak in h k at k = Ud/s leads to the 
peak in the distribution of real times and hence the peak in the distribution of 7r n . The width 
of the distribution of 7T n is somewhat wider, however, since even given individuals coming 
from fitness classes near the mean, there is a broad distribution of possible real times, and 
a broad distribution of 7r n even given a particular real time. 

This average heterozygosity would correspond to an effective population size of 



but as we have seen this effective population size cannot correctly describe the full distribu- 
tion of 7T n nor its relationship to other statistics describing the genetic diversity. For smaller 
values of N where the mutation-time approximation breaks down, the average 7r n would be 
somewhat lower than the MTA predicts, and its distribution somewhat broader. 

Muller's Ratchet: We have neglected Muller's ratchet throughout our analysis, and as- 
sumed that the fitness distribution hk is fixed. Yet Muller's ratchet will certainly occur, and 
in some circumstances could have a significant impact on genetic diversity (Gordo et al, 
2002; Seger et al, 2010). Thus this is a potentially important omission from our theory. 
In this section we discuss some of the complications associated with Muller's ratchet that 
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are important to keep in mind when considering our approach. We discuss the parameter 
regimes where neglecting Muller's ratchet should be reasonable, and those where it is likely 
to cause more serious problems. We provide rough estimates of how large we expect these 
problems to be, and suggest a few possible ways in which future work might incorporate 
Muller's ratchet into our general framework. 

Muller's ratchet causes several related problems within our theoretical framework. First, 
it causes the values of hk to change with time, and means they may not always follow a 
Poisson distribution. This changes the distribution of lineage frequencies within each class, 
and hence changes the coalescence probabilities. After a "click" of the ratchet, the whole 
distribution hk shifts in a complicated way, eventually reaching a new state where it is shifted 
left (so the class that was originally at frequency hk is now at frequency h^-i, and so on). 
In a similarly complex way, the PRF distribution of lineage frequencies in class k shifts from 
fk to fk-ij and so on. This naturally changes the coalescence probabilities in each class. 
Fortunately, since the coalescence probabilities in class k are generally very similar to those 
in classes k + 1 or k — 1, this effect is unlikely to lead to major inaccuracies provided the 
ratchet does not click many times within a coalescent time. This is true except when we 
start considering coalescence in classes close to the 0-class, where the fc-dependence becomes 
significant. This can be thought of as an additional problem associated with Muller's ratchet, 
and is associated with the fact that the ratchet shifts the whole fitness distribution. This 
effect is easiest to see with an example: imagine we sample two individuals within the fc-class, 
and that these individuals did not coalesce before their ancestors were both in the 0-class. 
At the time (in the past) when these individuals' ancestors were in the 0-class, this current 
0-class might have been the 1-class or 2-class (or higher). Thus these two individuals within 
the 0-class might not coalesce until, for example, their ancestors were in what is currently 
the "— 2"-class. This clearly means that we might in fact have 7r d > 2k, which our analysis 
assumes is impossible. In fact, we observe precisely this effect in simulations, and it is the 
reason why we commonly observe systematic deviations where the simulated values of 7^ 
are larger than our theory predicts. 

From this discussion it is clear that the key factor in determining whether Muller's ratchet 
can reasonably be neglected is how many times the ratchet "clicks" in a coalescence time. 
We have seen above that an average individual coalesces through the fitness distribution in 
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a time at most of order Mn([/d/s) generations. Once within the 0-class, coalescence times 
are of order Ne~ Ud ^ s . We must compare these times to the time it takes for the ratchet to 
"click." The rate of the ratchet is a complex issue that has been analyzed by GORDO and 
Charlesworth (2000a), Gordo and Charlesworth (2000b), and Kim and Stephan 
(2002) in the regime where Ne~ Ud ^ s > 1 and by Gessler (1995) in the regime where 
N e - u d/s < i n g enera l analytic expressions exist which are valid across all parameter 
regimes. However, provided the ratchet does not typically move a substantial fraction of the 
width of the fitness distribution in the coalescence time of two random individuals, it will 
be a small correction to 71^, and neglecting it is a reasonable first approximation. In practice 
we find in our simulations that for the parameter regimes we consider, the ratchet causes 71^ 
to be at most of order 2 larger than our theoretical predictions, corresponding roughly to a 
single click of the ratchet during a typical coalescence time. 

The discussion above suggests a way to incorporate Muller's ratchet within our theoretical 
framework, albeit in an ad-hoc way. The ratchet shifts the distribution hk underneath the 
fitness-class coalescent process. The details of this shift are complicated, but on average 
every click of the ratchet shifts the distribution one step to the left. We can define k min to 
be the number of deleterious mutations (relative to the optimal genotype) in the most-fit 
individual at any given time. For the case where Ne~ Ud ^ s > 1, the rest of the distribution 
will be approximately a Poisson distribution, but with hk replaced by hk-k min - Muller's 
ratchet can then be thought of as a process by which k min increases over time. This increase 
is a random process, but has some average rate, leading to an average k min (t). As we look 
backwards in time during the fitness-class coalescent process, the value of k min is decreasing 
due to Muller's ratchet. This suggests a simple approximation: we replace the actual value 
of k with an "effective" value of k that accounts for the fact that k min decreases as we look 
backwards in time. For each step through the fitness distribution, we imagine that kjYii n has 
decreased by the appropriate amount, and hence the effective value of k in the new fitness 
class is decreased by less than 1 compared to the old fitness class. When Ne~ Ud ^ s < 1 the 
ratchet is an almost deterministic process, so a similar approximation may prove useful, but 
in this case the distribution hk is on average shifted from the Poisson form (Gessler, 1995). 
To incorporate the ratchet into our analysis in this situation, we first must recalculate the 
relevant coalescence probabilities given the expected average form of and then carry out 
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the above program. These and other methods to account for Muller's ratchet remain an 
interesting topic for future work. 

Despite the potential relevance of Muller's ratchet in practical situations, we note that it 
does not affect our results in the standard coalescent limit. As is apparent from our general 
expressions for the coalescence probabilities, the structure of our fitness-class coalescent 
theory does not depend on all three parameters TV, Ud, and s independently. Rather, it 
depends only on the combinations NUd and Ns. Thus our theory makes sense in the 
standard limit where NUd and Ns are held constant while we take TV —> oc. In this limit, 
Muller's ratchet does not occur. Whether this means we can neglect the ratchet for large 
but finite TV depends on the convergence properties of the coalescent limit. This is a difficult 
limit to explore with simulations, because it requires large population sizes. However, we 
have used simulations to verify in a few cases that, as expected, increasing TV while keeping 
NUd and Ns constant does not change the predicted structure of genealogies but decreases 
some of the systematic differences between theoretical predictions and the simulations which 
are suggestive of the effect of the ratchet. Note that while this ratchet- free limit does not 
change the structure of genealogies in our fitness-class coalescent, the distribution of real 
coalescent times does change, since all real timescales are proportional to s. Thus, as might 
be expected, we must also take NU n constant as TV — >• oc if we wish neutral diversity to also 
remain unaffected in this limit. 

Note that this ratchet-free limit, while fairly standard in coalescent theory, is somewhat 
different from the mutation-time approximation we discussed above. Of course, we can easily 
imagine a population which is large enough that the mutation-time approximation applies, 
and then take the standard coalescent limit. 

Conclusion: Our fitness-class coalescent approach provides a framework in which we can 
compute distributions of genealogical structures in situations where many linked negatively 
selected sites distort patterns of genetic variation. We have used this framework to calculate 
the distributions of a few simple statistics describing sequence variation. It remains for 
future work to use this fitness-class coalescent approach to compute a wide array of statistics 
to better understand the details of how purifying selection on many linked sites distorts 
patterns of genetic variation. The eventual goal will be to use our results to help interpret 
the increasing amounts of sequence data which seem to point to the importance of negative 
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selection on many linked sites. 
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APPENDIX A: THE FULL CONDITIONAL CALCULATION 



In the main text, we focused primarily on the non-conditional approximation to the coales- 
cence probabilities, which led to our simple expression for the coalescenct probabilities, Eq. 
(33). We saw in the main text that this non-conditional approximation can be relaxed by 
keeping the higher order terms in Eq. (46). In this Appendix, we show how this approxima- 
tion can be relaxed in our lineage-structure framework by carrying out the full conditional 
calculation for some of the simplest possible cases. We use this to understand the structure 
of the conditional results and discuss the validity of the non-conditional approximation. We 
note that the full conditional result can also be obtained from the sum of ancestral paths ap- 
proach, as described in Appendix D, and the validity of the non-conditional approximation 
can be directly assessed with that approach. 

We begin by considering the full conditional result for the probability that two individuals 
both sampled from class k coalesce in class k — 2. In the main text we found that this 
coalescence probability is 

2 

pkk ^k-2 = f gfe-2 (ii)i2) ^_^ /fc _ 2(x) ^ [ _ s(fc _ 2) | ii _ t2 | ]dMt2<fa (75) 

J ' {hk-2) 2 

= I k x ~ 2 J Q^kihM) exp [s(k - 2)|*i - t 2 \] dhdt 2 . (76) 

In order to evaluate this integral, we need to determine the probability distribution of 
mutant timings ^ 2 (ii, ^2)- The time t\ is now the sum of the time for one individual to 
have mutated from class k — 2 to class k — 1 plus the time for it to have mutated from class 
k — 1 to class k ) and analogously for £ 2 - However, in order for the two lineages to coalesce 
in class k — 2, they must not have coalesced in class k — 1. To illustrate the main point, we 
neglect the distortion in the mutant timings due to the fact that individuals did not coalesce 
in class k and focus only on the distortions due to the fact that coalescence did not occur in 
class k — 1; if desired, the former distortion can also be included using analogous methods. 
We refer to the probability distribution of the times when these individuals mutated from 
class k — 1 to class k conditional on them not having coalesced in class k — 1 as Q^^i; h\nc). 
The distribution of the times for these individuals to then have mutated from class k — 2 to 
class — 1 is then given by 

QX;t%=[s{k-l)\ 2 e-< k - l ^\ (77) 
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as in the first step. Thus the distribution of t\ and £ 2 is given by 



^k-l 



k-2 



(78) 



where * indicates a convolution. Note that much of the time when the individuals did 
coalesce in class k — 1, they did so because t\ happened to be close to i 2 (since this increases 
the chance the two individuals mutated from the same lineage). Thus in Q^ 1 ^, ^l^c), 
t\ and i 2 are on average further apart than in Q^^i, £2)5 an d t\ and i 2 are no longer 
independent random variables. 

We now need to calculate Q^^i, i 2 |nc). We have 



Qik(tiM nc ) = 



(79) 



where Q^^i, is the distribution of timings of mutations from class k — 1 to k given 
that the lineages do coalesce in class k — 1. Applying the general probability identity 
P(ti,i 2 |c) = p^yP(c|£i, i 2 )P(ii, i 2 ), and reading off the coalescence probability given ii and 
i 2 from Eq. (23), we find that 

jk-l 



Qk,k (tlM c ) - jjk^k- 



nQt^i^e-^- 1 ^-' 2 '. 



(80) 



We therefore find 



Q&ihMnc) = Ijm [{skfe~ sk{t ^ - /fc-i( sA; )2 e -2M*i+* 2 ) e - S (fc-i)l*i-* 2 |] . (81) 

1 Pc ' 

Plugging this into our convolution formula for Q^ 2 (ii,t 2 ) and evaluating the integrals 
by separating out the possible time orderings, we find 



Qk,k (*1j*2) 



_ k 2 [s(k - l)] 2 (k _ 1)(tl+t2) 



l-Pc 

where we have defined 
B = 



,k,k^k-l 



rk-1 



(l-e-*) (l- c -*»)-^B 



(82) 



1 



(k-2) 
1 



-2s min(ti,t2) 



k 



^ _ e -sfcmin(ti,t 2 )^ 



k 



(1 



3 -2fc|ti-t 2 | 



)(< 



-2smin(ti,t2) — sfcmin 



in(ti,t 2 )^ 



(83) 



We can now use this expression in Eq. (28) to calculate the coalescence probability P c 



k,k^k-2 



Since the result is tedious and does not further illuminate the structure of the full conditional 
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calculation, we do not do so explicitly here, but the integrals are straightforward to evaluate 
with the methods we have used above. 

To motivate the validity of the non-conditional approximation, we need to consider the 
full calculation going back one additional step. Thus we consider the probability that two 
individuals both sampled from class k coalesce in class k — 3, p^ k ^ k ~ 3 t This will be given 

by 

P^ k ~ Z = [ Q{?(*i, t^^f^e-^-^-^dhdhdx, (84) 
J n k-s 

where here Q k ~ k s (ti,t 2 ) is the distribution of the time at which the ancestors of the two 
sampled individuals originally mutated from class k — 3 to class k — 2, conditional on them 
not coalescing in classes k — 2 or k — 1. 

We can calculate Qk~k(ti^t 2 ) in the same way we calculated Q^ 2 (*i, *2)« Explicitly, 



fc-3 



(85) 



where analogously to the expression in the previous step 

Q k k ; k 2 {hMnc) = * fc _ 2 [Qt^ihM) - Qt; k \hM\c)P k c ^ k - 2 

1 fc 



(86) 



We note that Q kk (ti,t 2 ) is the expression in Eq. (82) we calculated above. As before, we 
have 

QtfihM^P^- 2 = /r 2 Qt 2 (^)e- s(fc - 2)|il -< 2 ', (87) 
hence we can write 

Q k k ~k{ti ) t 2 ) 



Ql k 2 {t u t 2 \nc) 



I _ jk-2 e -s(k-2)\t!-t 2 \ 



(88) 



Plugging the above expression back into Eq. 85, we obtain 

s 2 (k-l) 2 k 2 s 2 (k-2) 2 .. rt2 ru 



Q k h (ti,t 2 ) 



( 1 _P ( f.^-i)( 1 _P ( 



•k , k y k 2 



-s(k-2)(t 



) 



Jo JO 



e s(k-2)(y+z) e s(k-l)(y+z) 



X 



]_ _ jk-2 e -s(k-z)\y-z\ 



(1 - e~ sy )(l - e~ 



Tk-l 

X 

k-2 

->/c-3 



B 



(89) 



We could evaluate the integrals in the above expression for Q k ~ k (ti,t 2 ) in the same way 
that we did in our calculation for Q k ~ k 2 (ti,t 2 ). We would then substitute this result for 
Q k ~ k s (ti,t 2 ) into an analogous calculation of Q k ~ k 4 (ti,t 2 ), and so on. In this way we can 
build up the full conditional results. The most useful way to go about this is to separate the 
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results into powers of I X) which is a small parameter related to the coalescent probability 
in each step. We see from the expression for Q^kitut^) that there is a term in (l^) , 
which is exactly the non-conditional approximation. There are two terms involving {I x ) l ) 
and a single term involving (I x ) 2 - In general, in the expression for Qjji~^(ii, £2)? we will have 
one (I x )° term (which equals the result in the non-conditional approximation) plus £ terms 
proportional to I x , terms proportional to (/ x ) 2 , and so on. Fortunately, the dependence 
on the population parameters is entirely contained within these powers of I x . That is, the 
coefficients of these various powers of I x depend only on k and £, and not at all on the 
population parameters TV, s, and Ud- Thus we could simply calculate a table of coefficients 
once, and then would be able to understand all the distributions of mutant timings (and 
from this all the coalescent probabilities). 

In practice, it is easier to make these full conditional calculations within the sum of 
ancestral paths approach. As we have seen in the main text, that approach leads naturally 
to a power series in I x of exactly the form described above, in which the leading order term 
is the non-conditional approximation and the additional terms represent the conditional 
corrections. This calculation shows that provided which is true provided our usual 

condition that Nhksk 3> 1 holds, these higher order terms are all small, and our non- 
conditional approximation is valid. 

These full conditional results are, however, very complex and unilluminating. Therefore 
we focus here on understanding the general structure of these results, and on showing why 
the non-conditional approximation is good description of the distribution of mutation tim- 
ings. We can see that at each step back through the fitness distribution, the probability 
distribution of times shifts from the non-conditional results by a factor which is roughly 
proportional to the coalescence probability at that step. That is, in general we have 

QtAhM) = - * -< W^M) - P c k ^ k - e Qtf(ti,h\c)} ■ (90) 

The first term in square brackets reflects the fact that the probability distribution at a 
given step conditional on non-coalescence at that step is almost equal to the unconditional 
probability distribution at that step. The second term represents the correction: note that 
it is proportional to the coalescence probability in that step, p^ k ^ k - £ . The nature of the 
correction can be seen by plugging in the distribution of times conditional on coalescence, 
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giving 

D k - £ (+ + \ - ® k > k ^ll^j [1 jk-i-s{k-i)\t 1 -t 2 \\ / Q1 \ 

v*,* i*i ,*2 ; - 1 _ pk ,k^k-£ [ l - l x e 'J- 

We see that the correction acts to reduce the probability that \t\ —t 2 \ is small — that is, it 
makes it more likely that t\ and £ 2 are further apart, because this is more likely to be the 
case given that coalescence did not occur. 

Since at each step the shift in the distribution of mutant timings is proportional to the 
coalescence probability, and the coalescence probability at each step is small, it seems clear 
that the non-conditional approximation where we simply ignore this shift in mutant timings 
is reasonable. However there is one potential caveat we must consider: although the shift 
in the distribution of mutation timings due to conditioning on non-coalescence is small in 
each step, we typically take many steps before the lineages coalesce. In fact, since the shift 
in mutation timings is proportional to the coalescence probability, and we typically go back 
a number of steps of order one over the coalescence probability, in principle the shifts in 
mutation timings could add up to a substantial shift. 

Fortunately, there are three factors which prevent this from happening. First, the shift 
in mutation timings at each step is always to reduce the probability of times t\ and t 2 where 
|*i _ *2| ^5 ^-£)s m Since at each step £ is increasing, and the range of separations between 
mutation timings at which coalescence can happen is also increasing, the shifts in mutation 
timings from many steps ago are not a huge factor in determining coalescence probabilities 
in a particular step. That is, though the shifts in mutation timings add up over many steps, 
the shifts most relevant to the coalescent probability in a given step do not. Second, the 
coalescence probabilities at each step are different. This reduces the chance that we take 
enough steps to shift the overall mutation timings substantially by the time we coalesce. 
Finally, and most importantly, we will see that the there is a substantial probability that 
the ancestors of the two individuals sampled do not coalesce until they are in the most-fit 
class. This means that the total sum of coalescence probabilities (and hence the total possible 
weight in the shift of mutation timings) remains small even in the worst case where the two 
lineages do not coalesce for the maximum possible number of steps. The non-conditional 
approximation will always be good in the regime where this is true. All of these heuristic 
conclusions are reflected in the fact that the full conditional result we calculate in the sum 
of ancestral paths approach is equal to the non-conditional result plus corrections that are 
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small provided I x ^> 1. 

APPENDIX B: THE NON-CONDITIONAL DISTRIBUTIONS OF 

MUTANT TIMINGS 

Within the non-conditional approximation we need to calculate the distribution of mutant 
timings, as used in Eq. (29) and Eq. (32). Specifically, we need to calculate 



k-2(,\ , /nfc-3/ 



>>k-£ 



(92) 



where * refers to a convolution and 



(93) 



as motivated in Eq. (7). In general, the convolution of n exponential distributions with 
parameters Ai . . . X n is given by 



n—l n—1 \ 

-\t TT A J 



n 



2=0 



Applying this identity with A^ = s(k — i), we find 



£-1 



Q k k - e (t) = J2se-^ t 

i=0 



We can simplify this expression by noting that 



3=0 
£-1 

n 

Vi=o,/t / 



n(*-j) = 

i=o 



(A; - £)!' 



and similarly that 



This means we have 



n (i - j) = n(t - 1 - iy.i-iy- 1 - 



Q k k - e (t) = J2s£e-< k -^(-l) 



i=0 



We can evaluate this sum by recognizing the binomial expansion formula 



(94) 



(95) 



(96) 
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where we identify x —e st . We find 

Qrt) = rf(^-*(^-i) M - (ioo) 

More generally, we have 

Q\{t) = s(a - b) Qe— « (e st - l)^" 1 . (101) 

APPENDIX C: GENERAL COALESCENCE PROBABILITIES IN THE 
NON-CONDITIONAL APPROXIMATION 

The probability of coalescence for two individuals originally in two different classes k and 
k', as defined in Eq. (32) can be rewritten as 

P ^'- l + 2NkL(k-i) lh + »- <102) 

where we have defined 

roc rt\ 

l x = / Q*- e (t 1 )e-V'- e ) tl / Ql-'ihy^-^dhdh (103) 
Jo Jo 

roc rt2 

I 2 = / Q k -\t 2 )e-< k -^ / Qtr'it^-^dt.dh. (104) 

Jo ' JO 
Note that both Ii and I 2 involve integrals of the form 

I a = f Q h a {t')e sht ' dt' . (105) 
Jo 

Plugging in the results for the non-conditional distributions of mutant timings, Eq. (101), 
and making use of the binomial expansion formula for (1 + x) n noted in Appendix B, we 
find this integral becomes 

h = s(a - b) Q jf (e« - ly-'- 1 dt' (106) 

= s(a - b) Q ^(-1)-^ ^ ~ b _ ~ 1 j jT e^-^'dt' (107) 

Ha-6)g)(-ir'g , ^( a - 6 )(e^-l) (108) 

= Q (-l) a " 6 E(-l)' ^ T ^ (e^"^ 4 - l) (109) 

= Q(_l)a-V( 6 -«)*2(- C ' t ) < (^"^ (110) 

= Q e ^^_!) a - 6 . (111) 
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We now substitute this result for I a into our expressions for I\ and 12- We note that both 
have terms of the form 

h = fQt(t)(^je-^(e st -l) C - b dt. (112) 
Using similar manipulations to those above, we find 

^("-KSOr^"^- 1 )™* (u3) 

- ,(a - b ) @ (-I)— ° + E "' + C - 2 " - ') (-I)" f e-<-'"* (114) 
Using the partial fraction decomposition 

1 o-irf^. m 



(T) 

we find 

r _ ^(:)Q(-i) Q+c _ ^QQ(-i) 26 

ife ~ / -26-1 \ ~ ( a+c \ 

\a+c-2b) \a+c-2b) 

We can now use this result for If, to determine Ii and I 2 , and hence compute P' L 
We find 



(117) 



k,k'^k' 
C 



pk,k'^k'-£ = 1 (fc-l) (fc-l) 1 8 x 



As we noted in the main text, this is just 

-pk,k'^k-£ _ 1 /i i q\ 

^ "1 + 2^.^-^)^ ' (iiyj 

with as defined in Eq. (34). Note that when k = k! , this result simplifies to p^ k ^ k ~ £ 
as defined in the main text, as expected. 

APPENDIX D: COMPUTING SUMS OF ANCESTRAL PATHS 

In this appendix, we describe the calculation of using the sum of ancestral paths 

approach. 

Calculation of 0^(3): We begin by considering a simpler specific case, where k = k' 
and t — 3. There are a total of (^j = 20 possible ancestral paths by which two individuals 
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sampled from class k can coalesce in class k — 3. These can be separated into four types, 
according to whether the two ancestral lineages were ever together in classes k — 1 or k — 2. 
We can list all paths of each type, using the notation that A is a mutation event in the first 
lineage, and B is a mutation event in the second lineage. We have 

^ ABABAB^ 



ABABBA 
ABBAAB 
ABBABA 
BAABAB 
BAABBA 
BABAAB 
BABABA 



( ABAABB^ 
ABBBAA 
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BBAABA 



1 AAABBB ^ 
AABABB 
BBBAAA 
BBABAA 



Ws GXGHMH ways 



^)-others=4ways 



(?)(?) (?)=8 ways 



The probabilities of all paths of a particular type are identical. We can calculate the 
probability of each of the four types of paths using the same logic as outlined in the main 
text. We find 
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Summing over all the possible paths, we find 
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We now pause to consider the form of the probabilities of each type of ancestral path. 
These probabilities differ only by factors of (1 — One such factor arises each time 

the two ancestral lineages are together in class k — i. In other words, we can rewrite 
the probability of each path as the probability of an undistorted path (defined to be a 
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path in which the contributions due to the possibility of coalescence in previous classes are 
neglected), times a correction for each class in which the two lineages are together: 

P(AAABBBc) = P(Undistorted Path) (l - I k ) (125) 

P(AABBABc) = P(Undistorted Path) (l - I k ) (l - 1^) (126) 

P(ABAABBc) = P(Undistorted Path) (l - I k ) (l - I k ~ 2 ) (127) 

P(ABABABc) = P(Undistorted Path) (l - /*) (l - J*" 1 ) (l - J*" 2 ) . (128) 

By definition, the "undistorted path" probability is the probability neglecting the contribu- 
tions due to the possibility of coalescence in previous steps, and is therefore the same for all 
paths. We have 

P(Undistorted Path) = - - ^'^'J^'^ ~ % ^ ( 12 9) 
v ' 2k(2k- l)(2fc-2)(2fc-3)(2fc-4)(2fc-5) x v ' 

fc! fcj 

- (fc " 3)! (fc " 3)! (130) 



2k\ *x 



(2/c-6)! 

Using these results, we can write <^(3) as 

4(3) = [# of Paths] P(Undistorted Path) [F k (l - I k x ) + F^_x(l - I k x )(l - I^ 1 ) 

+F k , k - 2 (1 ~ I k x )(l ~ It 2 ) + ^,*-i,*-2(l " I k x )(l ~ It 1 )^ ~ It 2 )} , (131) 

where we have defined F{ a y to be the fraction of paths that are together in the set of classes 
{a} (and are not together in any other class). 

Calculation of 4>%{t)'. We now use this approach to calculate the coalescence probability 
in the general case. The probability of any particular ancestral path from k and k' to k — £ is 
the product of the individual probabilities of each mutational step that makes up this path. 
Each such individual probability consists of three parts: a numerator, which depends only 
on the current class of the lineage that mutates, divided by a denominator, which depends 
only on the sum of the current set of classes for both lineages, times a correction factor of 
(1 — If 1 ) ^ the two lineages are in the same class at that step. 

Although in each ancestral path the mutations will occur in a different order, all paths 
will ultimately consist of the same set of mutations (k f — >• k! — 1 — >• ... — >• k — £ and 
k ^ k — 1 —> ... — » k — £). Therefore, regardless of the path taken, the product of the 
numerators from each step will be identical. Similarly, the sum of the current set of classes 
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will begin at k'+k, and decrement by one each time a deleterious mutation occurs, until both 
lineages are in the final class (k f + k — >• k! + k — 1 — >• . . . — >• 2k — 2£). Therefore, regardless 
of the path taken, the product of the denominators from each step will also be identical. 
Therefore, the paths will differ only by the correction factor (1 — I k ~ l ) for each class in which 
the two ancestral lineages are together. This means that, analogous to the case of 0^(3) we 
described above, the probability of each path is the probability of an "undistorted path" 
times the appropriate correction factor. The probability of the undistorted path is 

r,/ TT -i • i -r~\ l \ k\k f -l)...(k-l+l)k(k-l)...(k-l+l) i , , . 

P(U„d 1S to rt ed Path) = >- ^ - - -1 L ( ^_ J - ^ (132) 

We can now sum up all possible paths to obtain 

r £ 

<t>W) = [# of Pat M ^(Undistorted Path) F + ^ F k _ z (l - I k x ~ l ) 

L 2=0 

+ " 4 fc " J )(l - (133) 

2=0 j>2 

+ EE E ^-i,*-m(i - 4 fc - J )(i - 4 fc - J ')(i - 4 fc " m ) + • • 

2=0 j>2 m>j 

where as before F{ a } is the fraction of paths that are together in the set of classes {a} (and 
are not together in any other class). Note that there are a total of t + 1 terms in this 
equation, representing the possibility that the two lineages can be together in anywhere 
from to £ of the classes. We can rearrange these terms to write 

£ 



W = [# of Paths ] ^(Undistorted Path) 



1 

2=0 



+ EEG*h*-;W (134) 

i=0 j>i 
£-21-1 i 

\ ^ \ ^ \ ^ ri jk—i rk—j jk—m , 

~ l^l^l^ <^k-i,k-j,k-ml x l x l x + • • • 
2=0 j>i m>j 

where we have defined G{ a y to be the fraction of paths that are together in at least the set 
of classes {a}. 

We can evaluate each of these factors of G. For example, the fraction of paths that 
are together in class k — i equals the number of ways for the two lineages to descend from 
classes k' and k to be together in class k — z, ^ +2z ), times the number of ways for the 
two lineages to descend from class k — i to be together in class k — 

4 ( 2 tf)> divided by the 
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total number of ways for the two lineages to descend from classes k' and k to be together in 
k-£, ( fe '~* +2 *). Using this logic, we find 

<t^M = [# of Pat M ^(Undistorted Path) (135) 

_ x (k'-k+2i\ (2£-2i\ i-2i-\ (k'-k+2i\ (2j-2i\ (2£-2j\ 

i _ V i jk-i , V i )\3-i)\ t~3 ) jk-i jk-j 



-fc+2-A ^ ^ 



(k'-k+2i\ x ' ^ ^ /fc'- 

=0 V ^ J i=0j>i ^ 

The total number of paths is ^ "^ +2 ^, so we finally find that the full probability of 



coalescence in class k — £ is 



= rfc-^MM 1 V j )\t-i J jk-i ■ 

-2£-l A'-^+2A /^2j-2A f2i-2j\ 

^ V j / V j-j J V i-j 7 jk-ijk-j _ 



t- 

EE 

2=0 j>2 I ^ J 



^'-/c+2^| 



(136) 



This is Eq. (46) from the main text. Note that it equals our non-conditional result for 
pk.k'^rt ti mes a correction factor. There are a total of t + 1 terms in this correction factor. 
This full correction factor can be arbitrarily complex for large I, so we do not write out a 
general form here. However, it is straightforward to calculate for any values of k, k 1 and t\ 
a Mathematica script to do so is available on request. 

APPENDIX E: THE CORRESPONDENCE BETWEEN STEPTIMES 

AND REAL TIMES 

In this Appendix, we calculate the correspondence between steptimes and the actual times 
measured in generations. Our goal is to calculate the probability distribution of real co- 
alescence times, ^(t\k,k',£), given that individuals were initially in classes k and k! and 
coalesced in class k — i. 

To begin, we neglect the coalescence time within class k — I, and consider the time at 
which an ancestor of one of the two sampled individuals first mutated from class k — £ to 
class k — I + 1, ipi(t\k, k! We first calculate the joint distribution of the times at which 
both ancestors mutated out of the class, i?j^f (ii, t 2 )- Conditional on coalescence in class 
k — t, Rl~jj (ii, i 2 ), is given by the probability of t\ and t 2 and coalescence divided by the 
total probability of coalescence. That is, 

H[t 1 ,t 2 ) = —, tt • (137) 

f{coal) 
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Rtjituh) = w Q^(M 2 )e- sMI "- i21 . (138) 



Substituting in the relevant expressions from the main text, this gives 

1 

ik, 
H 

The time at which the first ancestor mutated out of class k — £ is the longer of the two 
times t\ and i 2 , 

rt rt 



W\k,k',e)= I R k k J(t 1 ,t)dt 1 + [ R k k J(t,t 2 )dt 2 

.JO JO 



(139) 



Substituting in our expression for R% J (ii, t^) and carrying out the integrals as in Appendix 
C, we find 

^(t\k, fc', i) = SK d e-< k '^\e st - If 1 ^' + ^ , (140) 

where we have used 71^ = k f — k + 2£. 

We can alternatively calculate ipi(t\k,k f ,£) using our sum of ancestral paths approach. 
As before, we imagine two individuals sampled from classes k and k! and condition on them 
coalescing in class k — £. Consider a case where k ^ k' . Then the first event in the history 
of these two individuals must be a deleterious mutation. Since these mutations happen at 
rate sk and sk f in each lineage, the distribution of times since this mutation occurred in one 
of the two ancestral lineages is 

P(t) = s(k + k')e-< k+kf)t . (141) 

With probability this mutation is in the lineage sampled from class k' \ in which case 
the two lineages are now in classes k and k! — 1. Alternatively, the mutaion occurred in the 
lineage sampled from k and the lineages are in classes k — 1 and k f . 

We can now consider the time to the next event backwards in time. If the two lineages 
are in the same class (but not yet in class k — £), the distribution of times to the next 
deleterious mutation event is somewhat shorter, because we are conditioning on coalescence 
not occuring. However, provided that 2sk\ 3> (the condition we are already making 
elsewhere) , this shortening of the time will be a small correction and neglecting it is a good 
approximtion. 

Making this approximation, the rate at which the next deleterious mutation event occurs 
when the two lineages are in classes ki and k 2 is just s(ki + fc 2 ). Regardless of the order 
in which these mutations happen between the two lineages, this sum is simply decreased 
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by s at each step. This will continue until the both ancestral lineages are in class k — I. 
Therefore, the distribution of times until the original mutation out of class k — I is given by: 



Mt\k', k, i) = s(k'+k)e- s ( k ' +k)t *s(k'+k-l)e- s{k ' +k - 1)t *. . .*s(2fc-2£+l) e - s(2fc - 2m) *. (142) 
This can be written as 

MtW, k , I) = A oe~ Aoi * A ie - Al * * ... * Afcz-fc+a^ie-**'-^-!', (143) 
where we have defined: 

Xi = s(k' + k-i). (144) 

We can compute this convolution as in Appendix B (compare to Eq. (92) for Qk+k/ £ (t))- 
We find 

Mt\k, k', Q = SK d e-< k ' +k »(e st - l)*'" 1 ^ + ^ , (145) 

identical to the result of our lineage structure calculation above. 

Distribution of Coalescence Times: To calculate the correspondence between step- 
times and real times, we now need to add the time it takes two individuals two coalesce in 
class k — £, ip2{t\k ) fc 7 ? to the time it took them both to get to that class, ^i(t|fc, k\ k — £). 
The rate of coalescence once in class k — £ is — , so we have 

MtW, k, l) = (2s(k -£) + 1/Nh k _ e ) e -W*-<)+i/M*-0*. ( 146 ) 
Putting this together, the full distribution of times since coalescence is 

ip(t\k',k,e) = fa(t\k/,k,e)*ifa(t\tf,k,e). (147) 

Carrying out this convolution (and expanding the binomial factor (e st — l)^ -1 in ipi), we 
find 

k, i) = £ ^(-1)— ~ ^ + ^ - (e-sBt _ e -sM^ (14g) 

where we have defined A = k! + k — i and B = k — £ + — . 
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APPENDIX F: AN ALTERNATIVE APPROACH TO NEUTRAL 

DIVERSITY 



Instead of calculating the distribution of neutral heterozygosity by first computing the dis- 
tribution of real times, we could alternatively incorporate them directly into the sum of 
ancestral paths framework. This completely bypasses the correspondence with real coa- 
lescence times. To do this, we characterize ancestral paths not only by the ordering of 
deleterious mutation and coalescence events, but also by the ordering of neutral mutations. 
This means that if we sample two individuals A and £>, there are five types of events that 
can happen in their ancestral paths: a deleterious mutation (DM) in A or in £>, a neutral 
mutation (NM) in either A or in B ) and or a coalescence (C) event (if A and B are currently 
in the same class). 

We now imagine that we sample two individuals from classes k and k\ and that they 
coalesce in class k — £. Our goal is to calculate the probability distribution of 7r n given fc, 
k' ) and t, p(7r n \k,k f ,£). We will find it helpful to divide the five types of events that can 
occur into two classes: neutral mutations on the one hand, and deleterious mutations or 
coalescence (which we call "steps") on the other. We begin by computing the probability 
that a given number of NMs occur before the next DM or C events (i.e. the number of 
neutral mutations that occur at this "step"). We have 



where we have made our usual assumption that Nhksk 3> 1, allowing us to neglect the rates 
of coalescence events (when k = k f ) in writing this expressions. 

This probability only depends on the sum of the current classes the individulas are in. 
At each subsequent step, regardless of the path taken, this sum of the classes will decrease 
by one. Therefore, the probability that neutral mutations occur at step i is independent 
of the path taken. This observation allows us to calculate the probability that a given 
total number of neutral mutations have occurred since coalescence. We first calculate the 
probability that a given number of neutral mutations have occurred since the first deleterious 
mutation out of the k — £ class. We will add in the additional neutral mutations once in the 
k — £ class at the end. 

In order for n n neutral mutations to have occurred since the first deleterious mutation 



P(a NMs, then DM in k f or k f \k' 




(149) 
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out of class k — £, we require that ao mutations occurred at the first step, a\ mutations 
occurred at the second step, and so on, such that a + ai + . . . + a^-k+2£-i = K n . This gives 



(k'+k)\ 



a o / or r / a \ a k'-k+2i-\ 



(150) 



o(tt - x\k' he)- {2k - 2e)l V ( 2Un/s ) ° ( 2Un,s 

PV*» a { ^ +k , +k)l ^ \ 2 U n / s + k + k' ■■■\2U n /s + 2k-2l + lJ 

We can define x = 2U n /s + k + k\ recognize ltd = k' — k + 2£, and relabel the as 

a ->■ X - b , ai^b -bi, ... a nd - 2 K d -3 - K d -2, a^ d -i K d -2- (151) 
This gives 

= x\k',k,i) = ^ q* £ (_£_)- (152) 



6o=0 

To simnplify this expression, it is helpful to define a function f such that: 

f(A,B),(l) X i(-^) b ' (153) 



In other words, f (A, B) is a set of A nested sums, each of the same form, except for the 
final sum, which can have a different denominator. Using this definition, we have 

( k ' +k ) S2U\ X 

P(7r n = X\k',k,e)= Al d k L (-^) f fa -2,^-1). (154) 
The virtue of this definition is that this sum can be solved recursively. We have 



bA -' 'x-A\ bA x-B x-A fx-A\ bA 



Therefore we have 



^L-b) ~ a-b a-b\x-b) ■ (155) 



f (A, B) = |-Af {A -1,B)- ^JLf (A - 1, A) . (156) 
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Repeatedly inserting this result yields: 

(x- A)(x- A- 1) ff(A-l,A + l) f(A-l,AY 



t(A,A + l) ->• 



x-A-1 x- A 



f(A,A + l) (^-- 4 + 1 )( a; -- 4 )(^-- 4 - 1 ) 



f (A-2,A+1) _ 2f (A -2, A) f (A - 2, A - 1)" 
x-A-1 x- A 1 x-A + l 



, (A, A + 1) (m + 1) (* " ^ ™) g (™)f (A - m ,A + 1 - i) . (157) 

Note that f (— 1, B) = 1/B X , since there are no more sums to compute. Thus, for m = A + 1 
we have 

f(^+l) = (^ + ^ + 2 )g ( JJ 1+i)JM ( + ). (158) 
Relabeling the sum and taking A — 7r^ — 2, we have 

f^-^-iH^JgjJ^^: 1 ). (i5 9 ) 

We can now substitute these results into our expression for 7r n , to find 

^■^■a(T)i w ,r>-p h") « 

Note, however, that this is only the distribution of neutral mutations since the first delete- 
rious mutation out of class k — I. It is also possible for neutral mutations to occur prior to 
the coalescence event. Adding in this factor, we find 

p(7r n = X\k', k,£) = 7^' + k) j EVi)* 7" (161) 

^ (2e/ n /g) x / 2N k . t U n V n ~ X 

X jfco ( 2U n/s + k + k'- i) x + l \1 + 2N k _ t U n + 2N k _ lS (k -I) J 

Rearranging this expression gives 

(162) 

where we have defind 

A = k' + k-i, B = 2k-2£ + — ) — , (163) 

Nsh k -i 

identical to our earlier result. 
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FIG. 1 The distribution of the fraction of the population in each fitness class, (a) The distribution 
of the number of individuals as a function of fitness, where the most beneficial class is arbitrarily 
defined to have fitness 1, and each deleterious mutation introduces a fitness disadvantage of s. 
Mutations move individuals to less-fit classes, and selection balances this by favoring the classes 
more fit than average. The shape of the depicted steady state distribution is a result of this 
mutation-selection balance. The inset (b) shows the processes which lead to this balance within 
a given fitness class; this is explored in more detail in Desai et al. (2010). 
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FIG. 2 Each fitness class in the population is composed of many lineages, each of which was 
created by a single mutation and is (in our infinite-sites model) genetically unique. In Desai et al 
(2010) we described the distribution of lineage frequencies within each fitness class. Shown is a 
schematic cartoon in which each lineage is depicted in a different color. The arrows denote an 
example of the fitness-class coalescence process for two individuals sampled from classes 8 and 9. 
These individuals came from different lineages, and these lineages were created by mutations from 
different lineages within the next most-fit class (as shown by the arrows). The arrows trace the 
ancestry of the two individuals back through the different lineages that successively founded each 
other, until they finally coalesce in the class third from right. 
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FIG. 3 Examples of the coalescence probabilities p^ k '^ for two individuals sampled from fitness 
classes k and k f to coalesce in class k — shown as a function of £. Here Ud/s = 8, s = 10 -3 , and 
results are shown for Ns = 10 (dotted lines), Ns = 50 (dashed lines), and Ns = 100 (solid lines). 
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FIG. 4 Characteristic examples of the distribution of tt^- Here A^ = 5x 10 4 , s = 10 -3 and in (a) 
Ud/s = 2, while in (b) Ud/s = 4. Theoretical predictions are shown as a solid line, simulation 
results as a dashed line. The fit to simulations is good, but we tend to slightly underestimate 
the coalescence times, and this tendency is worse for larger Ud/s. This is due to Muller's ratchet, 
which becomes more problematic as we increase Ud/s. This systematic underestimate becomes less 
severe (for all values of Ud/s) as N increases, as expected, but comprehensive simulations for much 
larger N are computationally prohibitive. 
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FIG. 5 Characteristic examples of the distributions of 7v n and the real coalescent times, (a) 
Theoretical predictions for the distribution of 7T n for Ud/s = 2, compared to simulation results. 

(b) Theoretical predictions for the distribution of 7T n for Ud/s = 4, compared to simulation results. 

(c) Theoretical predictions for the distribution of real coalescence times for Ud/s = 2; note these 
simply mirror the distribution of tv u , as expected, (d) Theoretical predictions for the distribution 
of real coalescence times for Ud/s = 4. In all panels we have N = 5 x 10 4 and s = 10 -3 . Our theory 
agrees well with the simulations, but note that, as with 7rd, we tend to systematically underestimate 
7r n , and this tendency is worse for larger Ud/s. This is due to Muller's ratchet, and as expected 
becomes more problematic for larger Ud/s. This systematic underestimate becomes less severe (for 
all values of Ud/s) as we increase TV, as expected, but comprehensive simulations for much larger 
N are computationally prohibitive. 
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FIG. 6 Theoretical predictions for the mean pairwise heterozygosity at negatively selected sites, 
(iTd), as a function of the parameters, (a) (71^) as a function of Ud/s for several values of Ns. In 
the "mutation-time" approximation we expect this to be linear with a slope of 2, since on average 
individuals are sampled from the mean class at k = Ud/s and coalesce in the 0-class, and hence 
have TTd = 2Ud/s. We see that as expected this approximation becomes more and more accurate 
as Ns increases. For smaller TV, there is substantial probability of coalescence in the bulk of the 
fitness distribution, which is greater for larger Ud/s. Thus the slope of (71^) as a function of Ud/s 
decreases as Ns decreases, and has a downwards curvature, (b) (71^) as a function of Ns for 
several values of Ud/s. We see that as Ns becomes large, (71^) approaches 2Ud/s, again consistent 
with the mutation-time approximation. As Ns decreases, coalescence within the bulk of the fitness 
distribution becomes more likely, and hence (71^) decreases. 
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FIG. 7 Theoretical predictions for the mean real coalescence time (t). All real coalescence times 
in our analysis scale linearly with ^ (for fixed N and Ud/s), so in this figure we fix s — 10 -3 and 
show the dependence of the mean pairwise heterozygosity on N and on Ud/s. The mean pairwise 
heterozygosity at neutral sites, (7v n ) is simply (7v n ) = 2U n (t). (a) Mean coalescence time as a 
function of N for various values of Ud/s. We see that (t) increases slowly with N until for large 
enough N the EPS approximation applies and (t) becomes linear in N. (b) Mean coalescence 
time as a function of Ud/s for several values of N. For large TV, the dependence is roughly linear, 
consistent with the EPS approximation. For smaller TV, coalescence can occur in the bulk of the 
fitness distribution, reducing the mean coalescence time. 
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FIG. 8 The fitness-class coalescence process for three individuals, A, B and C, where A and B 
coalesced T3 steptimes ago and C coalesced with the other two T2 steptimes ago. 
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FIG. 9 Relationship between our results and an effective population size approximation, (a) A 
typical coalescent tree in a neutral population of constant size. The coalescent probability per 
generation between a random pair of individuals is the inverse population size. Time runs from 
the past at the top to the present at the bottom, (b) An example of a neutral coalescent tree 
in a population which was smaller in the past than the present. The population size is shown 
as the width in green. Coalescence events are more likely to occur when the population size is 
smaller, (c) The effective population size history for an individual experiencing purifying selection 
according to our model. The individual spends on average ^ generations in class fc, which has 
a total size Nh^. Note that pairs of individuals are sampled from different classes k (i.e. they 
are not all sampled from the bottom of this picture). Further, the coalescence probabilities also 
include a factor of A/2, which reflects the probability that two lineages are in the same class at 
the same time, (d) The historically varying effective population size N e (t) for a pair of individuals 
sampled from classes k and &/, as defined in the text, for several values of k and k 1 . The N e (t) for 
two individuals sampled at random from the whole population is also shown. Here N = 5 x 10 4 , 
U d /s = 6, and s = 1CT 3 . 
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